darwin’s dangerous idea (evolution 1)

. microsoft word has an option “text alternative” to add a description of a table or figure for visually impaired people, who will use screen readers for reading the document. adobe acrobat reader also has an accessibility pane to tag tables and add alternative text and descriptions of tables, which is used by the nvda screen reader to read aloud. moreover, commonlook office, whose motto is “build accessibility into documents early,” has add-ins for microsoft word or powerpoint to add enough accessibility content to the documents to information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 9 table 1. solutions and libraries for table extraction and processing. s no. tools open source image based comments 1 tabula y n extracts data tables from pdf and saves as csv or excel spreadsheet. it works on native pdf files and cannot extract scanned tables. it supports multiple platforms but does not support batch processing. 2 pdftables n n extracts page, table, table row, and even table cell. it is a fully automated api. it supports multiple platforms and multiple programming languages. 3 docparser n y extracts information from images and forms. it is a cloud-based application and supports batch processing. it parses the documents and offers more features but needs human intervention. it shows poor accuracy in handwritten application forms. 4 pdftron n n supports multiple platforms and multiple programming languages. 5 camelot y y a python library that extracts table from images. it has built-in ocr. 6 excalibur y y a web-based solution which is powered by camelot. 7 pypdf2 y n a python library that can do batch processing with multiple files. 8 pdfplumber y y a python library built on pdfminer. 9 pdf table extractor y n a web-based tool built on tabula. it supports scraping of multiple page tables and comparison of cell values. 10 pdfminer y n a python library that extracts information like location, fonts, and lines of the text. it focuses on analyzing text. it has a pdf parser. it figures out the semantic relationships among structured tables. make the resulting pdf accessible. however, already-developed unstructured documents, without any accessibility features, still need some measures to make the documents understandable to visually impaired or blind users. keeping in mind the statistics of visually impaired people and the unstructured data of the future—the global data sphere will grow from 33zb to 175zb and 80% of this worldwide data will be unstructured—visually impaired individuals cannot be ignored for their access to knowledge.68 therefore, we would need mechanisms for making these unstructured documents understandable to as many people as possible by incorporating accessibility measures in the document readers. the following section highlights some of the key issues in this domain. issues and challenges in the existing systems tables can be utilized in multiple scenarios including information extraction, table search, ontology engineering, conversion to dbms, and document engineering. 69 the situation becomes difficult when a blind or visually impaired person needs to understand the tables. the issues and challenges in dealing with pdf tables are categorized in the following sections. https://tabula.technology/ https://pdftables.com/ https://docparser.com/ https://www.pdftron.com/ https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 10 table structure tables in pdf documents need more focus on table structure detection because they do not follow a defined formal structure.70 several knowledge gaps are identified in literature regarding table structure, such as the identification of functional areas of tables, for which silva argued the use of multiple heuristics and machine learning algorithms in parallel or in sequence.71 the variety of structural layouts creates problems in their identification, which can be handled by defining more rules at the lexical and syntactic layer of table processing. this could also be fruitful for better semantic annotations.72 in addition, the variety of cell content or inconsistent cell content, along with implicit header cells, creates problems in understanding the tables, especially by machines.73 the vector representation of web tables may be applied to pdf tables for semantic annotations and identification of column types.74 along with that approach, graphical representation and a graphical neural network (gnn) can also be used for better structure identification in multiple domains.75 new data sets need to be introduced for structure recognition in various domains, including business and finance, as they use a huge amount of tables in their documents.76 from the discussion above, the table structure inconsistencies, cell content inconsistencies, functional and logical processing of tables needs more research effort to eliminate the stated problems. along with that, the inclusion of more data sets will also help in handling the diversity in the field. table formats the existing format of tables in pdf lacks the metadata needed for further processing; therefore, the conversion of pdf tables to other formats, especially open formats, will open new endeavors. some researchers have worked on converting tables to csv format, which retains the basic structure but lacks some cell formatting. researchers worked on the transformation of web tables to relational tables for easy manipulation.77 in contrast, xml can handle complex data and is more easily read by humans. therefore, a methodology is presented to work on tables in xml format, but it considers tables having text and numerical data only.78 json, another format, can also be used as an alternative to xml; it is smaller in size than the xml and can handle complex and hierarchical data. the json format has less support than xml but is preferred for web application due to its interoperability and lightweight features. table interpretation the variable representation patterns of table values, dense content and natural languag e processing create problems in the correct interpretation of tables.79 anaphoric resolution techniques and documenting level discourse parsers are suggested to handle complex references among multiple domains.80 moreover, handling the locality features of a table and the annotation of its property feature can lead to better interpretation of tables.81 the use of a knowledge base is suggested for understanding and annotating the relationships among tables and text to get more information about the extracted entities from tables and text.82 similarly, the extraction of data and its precision in medical and financial tables is an issue that needs the attention of researchers, as both fields have crucial and important data in its tables. 83 for easy interpretation of tables, machine learning classifiers, based on table headings and captions, can be used to classify them into their respective domain.84 the relationship of tables in a specific domain and or among multiple domains can be achieved by developing ontologies.85 this will enable the tables to be published on an lod cloud that will establish more relationships and infer insights from multiple domains. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 11 table evaluation most of the researchers working on pdf tables have tried to evaluate their work with popular data sets such as icdar 2013, icdar 2015, icdar 2017 pod, pubmed, unlv, and mormont. as we have pdf documents in multiple domains, therefore, new data sets should be introduced for structure recognition, especially in business and finance, as these domains use a large number of tables in their documents.86 an evaluation methodology was proposed for table detection, structure recognition, and its functional and semantic analysis.87 unfortunately, there are no standard metrics, parameters, and formal methodology for table processing evaluation.88 therefore, standard evaluation metrics should be defined for pdf tables, in order to standardize the evaluation of algorithms and frameworks. table presentation to blind and visually impaired users the available tools and techniques for reading aloud documents to blind and visually impaired people either read the table caption only and ignore the content or treat the tables as text and read the rows line by line. this does not help these users to understand the semantics of the table and its content. besides the content of the table, its layout shows grouping and connections among the content which is not presented to blind and visually impaired people by current solutions.89 therefore, tools and screen readers need to present tables in nonvisual format or give a summarized view of tables by following the guidelines of w3c, instead of reading the table like text.90 the summarized view of tables can become part of bibliographic metadata and can contribute in cataloging in the perspective of linked and open data. 91 a study highlighted the accessibility of published pdf articles by four journal publishers and presented the findings in graphs to show the trend from 2009 to 2013, by taking parameters including meaningful title, alternate text for images, and logical reading order.92 the author further applied the same methodology to analyze the articles published in next four years (2014 to 2018) and came to the conclusion that accessibility of pdf documents had improved. however, the journal publishers , who should be more aware of disability and accessibility, did not consistently follow the pdf/ua accessibility requirements and wcag 2.0 when producing pdf versions of their articles.93 therefore, visually impaired individuals should be provided with a mechanism for understanding the digital content and underlying semantics at multiple levels of abstractions, like the general information about the document and its elements—including tables—its structure and content, navigation in the table, and querying the table to get more details and lessen cognitive overload. accessibility of digital library collection the accessibility of large-scale digital library collections can enhance content for sighted as well as visually impaired users. the traditional utilization of digital library collections needs to be broadened by making computation-ready collections meant to be used and consumed in multiple domains.94 an effort was made by researchers to digitize and archive a digital repository of images and convert them to pdf/a documents but, unfortunately, the researchers came up with limited semantics as they did not consider the elements within the documents themselves.95 the accessibility of these converted documents may be compromised with these limited semantics. the rich semantics of tables can be used in the bibliographic classification of a digital library’s collection to increase the search width of the digital library.96 blind and visually impaired users can be assisted in using digital libraries, as they may need help at physical and cognitive levels. at the physical level, the blind may face difficulty in accessing information, identifying path and status, and efficiently evaluating information. at the cognitive level, they may face problems in understanding multiple structures, programs, information, features of the digital library, and the need to stick to some specific formats. therefore, the inclusion of help features will make the information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 12 digital library friendly to blind and visually impaired people by incorporating meaningful descriptions for nontextual elements.97 the sight-centered nature of the digital library creates problems for blind and visually impaired users due to missing textual or verbal instructions. some researchers identified the inclusion of labels and meaningful descriptions for hyperlinks, instructions, structure, multimedia content and nontext content to make digital libraries friendly to blind and visually impaired people.98 at the same time, others argue for improvement in usability by introducing help features in terms of usefulness, ease of use, and user satisfaction.99 the accessibility of digital libraries in general and its content in specific may be improved by accommodating help features in the interface and meaningful descriptions for the contents’ nontext elements including tables. conclusions and future research directions this study discusses the accessibility of tables included in pdf documents in general as well as in the specific environment of digital libraries. existing frameworks, algorithms, and solutions for the processing and interpretation of pdf tables, specifically their presentation to blind and visually impaired people, are thoroughly discussed. a general workflow of table processing is also presented in figure 1. the available solutions for reading out pdf documents to blind and visually impaired people are analyzed for their output, specifically for their attitude towards handling tables. furthermore, a list of resources for table interpretation and presentation are discussed along with their different features. the issues and challenges in table structure, format, interpretation, evaluation, its presentation to blind, and accessibility of digital library collection are discussed. the researchers working in the domain of accessibility, digital library, and pdf tables can extend and modify the current solutions and algorithms by following the future research directions given below. • the structure of a table has implicit semantic information which a sighted reader can infer but a blind reader needs assistance to understand. the structure of a pdf table is extracted using multiple approaches like heuristics, ontologies, machine learning and segmentation, whereas vectors are used for a web table.100 therefore, the combinations of multiple approaches and use of vectors for pdf tables may produce better results. • the content of a table is usually numeric or very short text and needs proper interpretation. therefore, a knowledge base can be used to get more information about the extracted entities from tables and text in order to understand and annotate the relationships among tables and text.101 these knowledge bases can be predetermined or may be selected automatically according to the table content or domain. • table interpretation can become easy if tables are classified according to their domains by using machine learning classifiers. the classification can be based on table headings and captions, as well as the title and author of the document.102 • ontologies are used to relate the tables in a specific domain and or among multiple domains, and publishing them on an lod cloud will establish new relationships.103 this will help in inferring new insights from complex, long, and numerical tables. • unstructured data and content can be made available for multiple usage and interpretations if it is converted to open formats like csv, json and xml.104 among these, csv comes with repeated content, xml needs special parsers, whereas json is lightweigh t and easy to write and read.105 it has support from nosql databases like mongodb and apache couchdb, and web application apis like twitter, you tube, and facebook. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 13 therefore, json might be a better option for the conversion of pdf tables for its multiple interpretation and navigation within tables. • the processes used for evaluation of tables have no defined matrices.106 therefore, the table evaluation processes should be defined with their respective matrices in order to standardize the research in this domain. • the precision of extracted content of table is very crucial especially in medical, financial, and experimental tables that have numeric data. therefore, the preprocessing of tables or conversion to other formats would need more attention to avoid any truncation or round off of the data. • the presentation of tables to blind or visually impaired people can be in nonvisual or summarized form.107 the summaries may be presented nonvisually, including the structural layout as well as a brief introduction of the table, to minimize the cognitive overload on these individuals. • to evaluate the accessibility of digital library interfaces, 16 heuristics were proposed to make the digital libraries in reach of users, however, more heuristics are needed to make generalized interfaces for all individuals.108 • the nontext elements of digital library collections should have meaningful descriptions for better understandability of blind and visually impaired individuals. the user-generated content about these nontext elements could be used for cataloging.109 • the rich semantics of tables can be exploited for cataloging and classification that will be helpful in exploratory searching. • as the michigan state university libraries has taken the initiative of assessing and improving the accessibility of digital library content by adopting the wcag guidelines, other libraries can also adopt the model for providing accessible content to their users including blind and visually impaired individuals. • the development of new data sets for tables in multiple domains can facilitate the researchers in interpreting tables and establishing relationships in cross-domains. this review paper is an attempt to highlight the knowledge gap in processing the pdf tables and its accessibility for blind and visually impaired individuals. an efficient and open-source solution for making pdf documents accessible to blind and visually impaired people needs to exploit the heuristics, ontologies, machine learning, and deep learning by using open-source libraries and tools for understanding and interpreting the tabular content in order to reduce information overload. endnotes 1 roya rastan, “automatic tabular data ex wcag traction and understanding” (phd diss., university of new south wales, 2017). 2 mark t. maybury, “communicative acts for explanation generation,” international journal of man-machine studies 37, no. 2 (1992): 135–72. 3 patricia wright, “the comprehension of tabulated information: some similarities between reading prose and reading tables,” nspi journal 19, no. 8 (1980): 25–29, https://doi.org/10.1002/pfi.4180190810. https://doi.org/10.1002/pfi.4180190810 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 14 4 jean-claude guédon et al., future of scholarly publishing and scholarly communication: report of the expert group to the european commission (brussels: european commission, directorategeneral for research and innovation, 2019), https://doi.org/10.2777/836532. 5 world health organization, world report on vision, october 8, 2019, https://www.who.int/publications-detail/world-report-on-vision/. 6 mireia ribera turró, “are pdf documents accessible?” information technology and libraries 27, no. 3 (2008): 25–43, https://doi.org/10.6017/ital.v27i3.3246. 7 kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29, https://doi.org/10.1086/685399. 8 iris xie et al., “using digital libraries non-visually: understanding the help-seeking situations of blind users,” information research 20, no. 2 (2015): 673. 9 heidi m. schroeder, “implementing accessibility initiatives at the michigan state university libraries,” reference services review 46, no. 3 (2018): 399–413, https://doi.org/10.1108/rsr04-2018-0043. 10 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no.4 (2016): 7–18, https://doi.org/10.6017/ital.v35i4.9469. 11 rakesh babu and iris xie, “haze in the digital library: design issues hampering accessibility for blind users,” electronic library 35, no. 5 (2017): 1052–65, https://doi.org/10.1108/el-102016-0209. 12 rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101. 13 xinxin wang, “tabular abstraction, editing, and formatting” (phd diss., university of waterloo, 1996). 14 rastan, “automatic tabular data extraction,” 25. 15 azadeh nazemi, “non-visual representation of complex documents for use in digital talking books” (phd diss., curtin university, 2015). 16 rastan, “automatic tabular data extraction,” 14. 17 max göbel et al., “icdar 2013 table competition,” in 2013 12th international conference on document analysis and recognition (2013): 1449–53, https://doi.org/10.1109/icdar.2013.292. 18 burcu yildiz, katharina kaiser, and silvia miksch, “pdf2table: a method to extract table information from pdf files,” in proceedings of the 2nd indian international conference on artificial intelligence (iicai, 2005): 1773–85; tamir hassan and robert baumgartner, “table recognition and understanding from pdf files,” in ninth international conference on https://doi.org/10.2777/836532 https://www.who.int/publications-detail/world-report-on-vision/ https://doi.org/10.6017/ital.v27i3.3246. https://doi.org/10.1086/685399 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.6017/ital.v35i4.9469 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.6017/ital.v38i4.11101 https://doi.org/10.1109/icdar.2013.292 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 15 document analysis and recognition (icdar 2007) (2007): 1143–47, https://doi.org/ 10.1109/icdar.2007.4377094; alexey shigarov et al., “tabbypdf: web-based system for pdf table extraction,” in international conference on information and software technologies (springer international publishing, 2018): 257–69, https://doi.org/10.1007/978-3-31999972-2_20. 19 minghao li et al., “tablebank: table benchmark for image-based table detection and recognition,” preprint, arxiv:1903.01949; sebastian schreiber et al., “deepdesrt: deep learning for detection and structure recognition of tables in document images,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 1162–67, https://doi.org/10.1109/icdar.2017.192. 20 zewen chi et al., “complicated table structure recognition,” preprint, arxiv:1908.04729. 21 michael cafarella et al., “ten years of webtables,” in proceedings of the vldb endowment 11, no. 12 (august 2018): 2140–49, https://doi.org/10.14778/3229863.3240492. 22 shah khusro, asima latif, and irfan ullah. “on methods and tools of table detection, extraction and annotation in pdf documents,” journal of information science 41, no. 1 (2015): 41–57, https://doi.org/10.1177/0165551514551903. 23 hassan, “table recognition and understanding”; richard zanibbi, dorothea blostein, and james r cordy, “a survey of table recognition,” document analysis and recognition 7, no. 1 (2004): 1–16, https://doi.org/10.1007/s10032-004-0120-9; andreiwid sheffer corrêa and pär-ola zander, “unleashing tabular content to open data: a survey on pdf table extraction methods and tools,” in proceedings of the 18th annual international conference on digital government research (june 2017): 54–63, https://doi.org/10.1145/3085228.3085278; christopher clark and santosh divvala, “looking beyond text: extracting figures, tables and captions from computer science papers” (paper, aaai workshops at the twenty-ninth aaai conference on artificial intelligence, austin, tx, january 25–26, 2015)., 24 ermelinda oro and massimo ruffolo, “pdf–trex: an approach for recognizing and extracting tables from pdf documents,” in 2009 10th international conference on document analysis and recognition (icdar) (2009): 906–10, https://doi.org/10.1109/icdar.2009.12. 25 vidhya govindaraju, ce zhang, and christopher ré, “understanding tables in context using standard nlp toolkits,” in proceedings of the 51st annual meeting of the association for computational linguistics (sofia, bulgaria: association for computational linguistics, august 2013): 658–64. 26 nikola milosevic et al., “disentangling the structure of tables in scientific literature,” in natural language processing and information systems, nldb 2016, lecture notes in computer science 9612 (springer, cham), https://doi.org/10.1007/978-3-319-41754-7_14. 27 rastan, “automatic tabular data extraction,” 48. https://10.0.4.85/icdar.2007.4377094 https://10.0.4.85/icdar.2007.4377094 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1109/icdar.2017.192 https://doi.org/10.14778/3229863.3240492 https://doi.org/10.1177/0165551514551903 https://doi.org/10.1007/s10032-004-0120-9 https://doi.org/10.1145/3085228.3085278 https://doi.org/10.1109/icdar.2009.12 https://doi.org/10.1007/978-3-319-41754-7_14 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 16 28 alexey shigarov, andrey mikhailov, and andrey altaev, “configurable table structure recognition in untagged pdf documents,” in proceedings of the 2016 acm symposium on document engineering, (2016): 119–22, https://doi.org/10.1145/2960811.2967152. 29 shigarov et al., “tabbypdf,” 262, 263, 265. 30 dae hyun kim et al., “facilitating document reading by linking text and tables,” in proceedings of the 31st annual acm symposium on user interface software and technology (october 2018): 423–34, https://doi.org/10.1145/3242587.3242617. 31 hassan, “table recognition and understanding,” 1145. 32 jing fang et al., “a table detection method for multipage pdf documents via visual separators and tabular structures,” in 2011 international conference on document analysis and recognition (2011): 779–83, https://doi.org/10.1109/icdar.2011.304. 33 bahadar ali and shah khusro, “a divide-and-merge approach for deep segmentation of document tables,” in proceedings of the 10th international conference on informatics and systems (may 2016): 43–49, https://doi.org/10.1145/2908446.2908473. 34 wenyuan xue et al., “table analysis and information extraction for medical laboratory reports,” in 2018 ieee 16th intl conf on dependable, autonomic and secure computing, 16th intl conf on pervasive intelligence and computing, 4th intl conf on big data intelligence and computing and cyber science and technology congress (dasc/picom/datacom/cyberscitech) (2018): 193–99, https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043. 35 roya rastan, hye-young paik, and john shepherd, “texus: a unified framework for extracting and understanding tables in pdf documents,” information processing & management 56, no. 3 (2019): 895–918, https://doi.org/10.1016/j.ipm.2019.01.008. 36 dafang he et al., “multi-scale multi-task fcm for semantic page segmentation and table detection,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 254–61, https://doi.org/10.1109/icdar.2017.50. 37 jing fang et al., “table header detection and classification,” in proceedings of the twenty-sixth aaai conference on artificial intelligence (july 2012): 599–605. 38 he et al., “multi-scale multi-task,” 255. 39 martha o. perez-arriaga, trilce estrada, and soraya abad-mota, “tao: system for table detection and extraction from pdf documents,” florida artificial intelligence research society conference, north america (2016). 40 saman arif and faisal shafait, “table detection in document images using foreground and background features,” in 2018 digital image computing: techniques and applications (dicta), (2018): 1–8, https://doi.org/10.1109/dicta.2018.8615795. 41 schreiber et al., “deepdesrt,” 1163, 1164. https://doi.org/10.1145/2960811.2967152 https://doi.org/10.1145/3242587.3242617 https://doi.org/10.1109/icdar.2011.304 https://doi.org/10.1145/2908446.2908473 https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043 https://doi.org/10.1016/j.ipm.2019.01.008 https://doi.org/10.1109/icdar.2017.50 https://doi.org/10.1109/dicta.2018.8615795 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 17 42 shoaib ahmed siddiqui et al., “decnt: deep deformable cnn for table detection,” ieee access 6 (2018): 74151–61, https://doi.org/10.1109/access.2018.2880211. 43 chi et al., “complicated table structure recognition.” 44 rahul anand, hye-young paik, and cheng wang, “integrating and querying similar tables from pdf documents using deep learning,” 2019, preprint, arxiv:1901.04672. 45 jiaoyan chen et al., “colnet: embedding the semantics of web tables for column type prediction,” in proceedings of the aaai conference on artificial intelligence 33, no. 1: 29–36, https://doi.org/10.1609/aaai.v33i01.330129. 46 ziqi zhang, “towards efficient and effective semantic table interpretation,” in international semantic web conference (2014): 487–502, https://doi.org/10.1007/978-3-319-11964-9_31. 47 ivan ermilov, sören auer, and claus stadler, “user-driven semantic mapping of tabular data,” in proceedings of the 9th international conference on semantic systems (september 2013): 105–12, https://doi.org/10.1145/2506182.2506196. 48 martha o perez-arriaga, trilce estrada, and soraya abad-mota, “table interpretation and extraction of semantic relationships to synthesize digital documents,” in proceedings of the 6th international conference on data science, technology and application—data (2017): 223– 32, https://doi.org/10.5220/0006436902230232. 49 varish mulwad, “tabel—a domain-independent and extensible framework for inferring the semantics of tables,” (phd diss., university of maryland, 2015). 50 syed tahseen raza rizvi et al., “ontology-based information extraction from technical documents,” in proceedings of the 10th international conference on agents and artificial intelligence (icaart) (2018): 493–500, https://doi.org/10.5220/0006596604930500. 51 corrêa and zander, “unleashing tabular content to open data,” 55. 52 irfan ullah et al., “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432. 53 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 54 govindaraju, zhang, and ré , “understanding tables in context using standard nlp toolkits,” 660, 661. 55 perez-arriaga, estrada, and abad-mota, “table interpretation and extraction,” 227. 56 kim et al., “facilitating document reading,” 425, 426. https://doi.org/10.1109/access.2018.2880211 https://doi.org/10.1609/aaai.v33i01.330129 https://doi.org/10.1007/978-3-319-11964-9_31 https://doi.org/10.1145/2506182.2506196 https://doi.org/10.5220/0006436902230232 https://doi.org/10.5220/0006596604930500 https://doi.org/10.6017/ital.v37i4.10432 https://doi.org/10.4018/ijswis.2018100106 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 18 57 rastan, pail, and shepherd, “texus,” 906. 58 nikola milosevic et al., “a framework for information extraction from tables in biomedical literature,” international journal on document analysis and recognition (ijdar) 22, no. 1 (2019): 55–78, https://doi.org/10.1007/s10032-019-00317-0. 59 chi et al., “complicated table structure recognition.” 60 wenhao yu et al., “tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system,” in the world wide web conference (may 2019): 3615–19, https://doi.org/10.1145/3308558.3314118. 61 anand, paik, and wang, “integrating and querying similar tables.” 62 turró, “are pdf documents accessible?” 2, 4. 63 nazemi, “non-visual representation of complex documents,” 110, 111, 112, 118. 64 juan cao, “generating natural language descriptions from tables,” ieee access 8 (2020): 46206–16, https://doi.org/10.1109/access.2020.2979115. 65 maartje ter hoeve et al., “conversations with documents: an exploration of document-centered assistance,” in proceedings of the 2020 conference on human information interaction and retrieval (march 2020): 43–52, https://doi.org/10.1145/3343413.3377971. 66 guédon et al., “future of scholarly publishing,” 42. 67 w3c, “wcag 2.0.” 68 world health organization, “world report on vision”; david reinsel, john gantz, and john rydning, “data age 2025: the digitization of the world, from edge to core,” idc white paper, #us44413318 (framingham, ma: idc, november 2018), https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataagewhitepaper.pdf/. 69 rastan, “automatic tabular data extraction,” 18, 19. 70 arif and shafait, “table detection in document images,” 1. 71 ana costa e silva, “parts that add up to a whole: a framework for the analysis of tables,” (phd diss., edinburgh university, uk, 2010). 72 milosevic et al., “a framework for information extraction from tables,” 60. 73 rastan, “automatic tabular data extraction,” 14. 74 chen et al., “colnet,” 31. 75 mulwad, “tabel,” 23; zewen, “complicated table structure recognition.” 76 siddiqui et al., “decnt,” 74160. https://doi.org/10.1007/s10032-019-00317-0 https://doi.org/10.1145/3308558.3314118 https://doi.org/10.1109/access.2020.2979115 https://doi.org/10.1145/3343413.3377971 https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 19 77 david w embley, sharad seth, and george nagy, “transforming web tables to a relational database,” 2014 22nd international conference on pattern recognition (2014) 2781–86, https://doi.org/10.1109/icpr.2014.479. 78 milosevic et al., “a framework for information extraction from tables,” 56. 79 milosevic et al., “a framework for information extraction from tables,” 55, 56. 80 kim et al., “facilitating document reading,” 432. 81 chen et al., “colnet,” 36. 82 asima latif et al., “a hybrid technique for annotating book tables,” int. arab j. inf. technol 15, no. 4 (2018): 777–83. 83 rastan, paik, and shepherd, “texus,” 909. 84 milosevic et al., “a framework for information extraction from tables,” 61, 62, 65, 66. 85 rizvi et al., “ontology-based information extraction,” 496. 86 siddiqui et al., “decnt,” 74160. 87 max göbel et al., “a methodology for evaluating algorithms for table understanding in pdf documents,” in proceedings of the 2012 acm symposium on document engineering (september 2012): 45–48, https://doi.org/10.1145/2361354.2361365. 88 rastan, paik, and shepherd, “texus,” 917. 89 david pinto et al., “table extraction using conditional random fields,” in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval (july 2003): 235–42, https://doi.org/10.1145/860435.860479. 90 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 91 ullah et al., “current state of linked and open data in cataloging,” 47, 48. 92 julius t. nganji, “the portable document format (pdf) accessibility practice of four journal publishers,” library and information science research 37, no.3 (2015): 254–62, https://doi.org/10.1016/j.lisr.2015.02.002. 93 julius t. nganji, “an assessment of the accessibility of pdf versions of selected journal articles published in a wcag 2.0 era (2014–2018),” learned publishing 31, no. 4 (2018): 391–401, https://doi.org/10.1002/leap.1197. 94 wittmann et al., “from digital library to open datasets,” 49, 50. 95 yan han and xueheng wan, “digitization of text documents using pdf/a,” information technology and libraries 37, no. 1 (2018): 52–64, https://doi.org/10.6017/ital.v37i1.9878. https://doi.org/10.1109/icpr.2014.479 https://doi.org/10.1145/2361354.2361365 https://doi.org/10.1145/860435.860479 https://doi.org/10.1016/j.lisr.2015.02.002 https://doi.org/10.1002/leap.1197 https://doi.org/10.6017/ital.v37i1.9878 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 20 96 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 97 xie et al., “using digital libraries non-visually,” paper 673. 98 babu and xie, “haze in the digital library,” 1057–59. 99 iris xie et al., “enhancing usability of digital libraries: designing help features to support blind and visually impaired users,” information processing and management 57, no. 3 (2020): 102110, https://doi.org/10.1016/j.ipm.2019.102110. 100 chen et al., “colnet,” 31, 32. 101 kim et al., “facilitating document reading,” 432. 102 milosevic et al., “a framework for information extraction from tables,” 61. 103 rizvi et al., “ontology-based information extraction,” 496. 104 embley, seth, and nagy, “transforming web tables to a relational database,” 2783; milosevic et al., “a framework for information extraction from tables,” 60. 105 nicholas j tierney and karthik ram, “a realistic guide to making data available alongside code to improve reproducibility,” preprint, arxiv:2002.11626. 106 rastan, paik, and shepherd, “texus,” 917. 107 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 108 mexhid ferati and wondwossen m. beyene, “developing heuristics for evaluating the accessibility of digital library interfaces,” in universal access in human–computer interaction, design and development approaches and methods, uahci 2017, lecture notes in computer science 10277 (springer, cham), https://doi.org/10.1007/978-3-319-58706-6_14. 109 ullah et al., “current state of linked and open data in cataloging,” 64. https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.1016/j.ipm.2019.102110 https://doi.org/10.1007/978-3-319-58706-6_14 abstract introduction the current state of table processing table extraction and processing using heuristics using segmentation using machine learning and deep learning approaches using ontologies relationship of tables with content and context existing accessibility-driven solutions for pdf documents issues and challenges in the existing systems table structure table formats table interpretation table evaluation table presentation to blind and visually impaired users accessibility of digital library collection conclusions and future research directions endnotes letter from the editors (march 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.16319 welcome to the march 2023 issue. despite the date, snow still covers the ground where the editor lives, and winter still appears to be holding on tightly to both coasts. we’re pleased to share with you the first issue of the calendar year and a collection of five peer-reviewed articles, as well as some news and updates (below). we also have a column in our public libraries leading the way series, “virtual production at cloud901 in the memphis central library” by alan ji and david mason, about how that library has adapted cutting-edge production techniques used in streaming tv shows such as the mandalorian to create virtual scenery in their teen-focused makerspace. peer-reviewed articles in the current issue are listed here: • the current state and challenges in democratizing small museums’ collections online / avgoustinos avgousti and georgios papaioannou • services to mobile users: the best practice from the top visited public libraries in the us / yan quan liu and sarah lewis • decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective / jin xiu guo and gordon xu • exploring final project trends utilizing nuclear knowledge taxonomy: an approach using text mining / faizhal arif santosa • japanese military “comfort women” knowledge graph: linking fragmented digital records / haram park and haklae kim call for new editorial board members coming in april the ital editorial board, a core committee, will be issuing a call for volunteers in april. for those selected, two-year terms of service will start on july 1. editorial board members have a critical role in building the foundation for the journal’s future through setting policy and content guidelines. members of the board have several key responsibilities: • shaping the direction and strategy for the journal; • participating in online editorial board meetings; • soliciting contributions to the journal (based on personal networking, conference attendance, etc.); and • optionally reviewing articles submitted to the journal, for those who want to be involved at an even deeper level (see the peer reviewer job description). if you are interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums, this is an exciting opportunity to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries. because we want the editorial board to reflect the broad diversity of core’s membership, we especially encourage individuals from underrepresented groups and identities to apply. ital will move to a new host this summer over the past year, the editors of the three core journals—ital, library leadership & management (ll&m), and library resources and technical services (lrts)—have been working with core and the core board to consolidate our journals on a single publishing platform. we’re pleased to say that ll&m and ital will move this summer to ala’s open journal systems platform, where lrts https://ejournals.bc.edu/index.php/ital/article/view/16315 https://ejournals.bc.edu/index.php/ital/article/view/14099 https://ejournals.bc.edu/index.php/ital/article/view/15143 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing information technology and libraries march 2023 letter from the editors 2 varnum and kelly is already published. we’ll have more details to share in our june issue, before the move, but want to let you know some important details: • ital’s urls will change, but dois will continue to resolve the new home of the journal. we will work with our current host, boston college, to set up redirects to the new location. • ala uses the same publishing platform as boston college, open journal systems, so for authors and reviewers, the experience will remain the same. • articles published in ital (and our two sibling journals) will continue to be open access with no fees charged to authors or readers. authors maintain copyright in their work. we are very grateful to boston college for their support of information technology and libraries over the past decade, and to the core board for supporting this project. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com call for new editorial board members coming in april ital will move to a new host this summer be a part of a future issue reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic library for scientific journals: consortium project in brazil rosaly favero krzyzanowski;taruhn, rosane information technology and libraries; jun 2000; 19, 2; proquest pg. 61 electronic library for scientific journals: consortium project in brazil making information available for the acquisition and transmission of human knowledge is the focal point of this paper, which describes the creation of a consortium for the 1111iversity and research institute libraries in the state of sao paulo, brazil. through sharing and cooperation, the project will facilitate information access and minimize acquisition costs of international scientific periodicals, consequently increasing user satisfaction. to underscore the advantages of this procedure, the objectives, management, and implementation stages of the project are detailed, as submitted to the research support foundation of the state of sao paulo (fapesp). i production, organization, and acquisition of knowledge in 1851, predicting the imminent growth in information, which in fact exploded in volume one hundred years later, joseph henri of the smithsonian institute voiced his opinion that the progress of mankind is based on research, study, and investigation, which generate wisdom, knowledge or, simply , information. he stated that for practically every item of interest there is some record of knowl edge pertinent to it, "and unless this mass of information be properly arranged, and the means furnished by which its content may be ascertained, literature as well as science will be overwhelmed by their own unwieldy bulk. the pile will begin to totter under its own weight, and all the additions we may heap upon it will tend to add to the extension of the base, without increasing the elevation and dignity of the edifice." 1 at the threshold of the twenty-first century, these words become more self-evident by the day. there are enormous archives of knowledge from which people extract parts, allowing them to advance and progress in science, technology, and the humanities. until some decades back, recovery from these archives was essentially a manual task consisting of written work and organization. today's technologies provide auxiliary tools to transmit this knowledge . although information is a cultural and social asset, it now is purchased at high prices . making these enormous archives available in a clear and organized manner by using the proper technology is currently the greatest challenge for all those involved in knowledge management-the production , organization, and transmission of information. rosaly favero krzyzanowski rosane taruhn i the advent and implications of electronic publications among the major contributions of the industrial era, outstanding are the evolution and growth of information publi shing and printing facilities that use tools to record, store, and distribute information. in the last ten years, the first steps were taken toward the storage and reproduction of sounds and images in new multimedia formats. technological advances also have brought new possibilities in accessing and disseminating information . electronic publishing has been particularly effective in accelerating access and contributing to the generation of additional knowledge; consequently, an exponential increase in data has taken place, most notably in the second half of the twentieth century. current journals numbered about 10,000 at the beginning of the century; by the year 2000 the number had reached an estimated 1 million. 2 as a result, specialized literature has been warning about a possible crisis in the traditional system of scientific publications on paper . in addition to the difficulty of financing the publication of these works, the prices of subscriptions to scientific periodicals on paper have been rising every year. at times, this makes it impracticable to update collections in all libraries, which interferes substantially in development. on the other hand, access to electronic scientific publications via internet is proving to be an alternative for maintaining these collections at lower cost. it also provides greater agility in publishing and distributing the periodical, and in the final user's accessing of the information. due to this, it is important that institutions that wish to support and promote research developed by their scientific communities facilitate access to these publications on electronic media . to paraphrase line, we can say that although publishers are still uncertain as to all the aspects of transmitting information electronically, because authors and institutions will be increasingly able to distribute their works on the web without the direct involvements of publishers, there is an escalation in electronic publications being published by scientific publishers.3 rosaly favero krzyzanowski is technical director of the integrated library system of the university of sao paulosibi/usp, brazil. rosane taruhn is director of the development and maintenance of holdings service of the technical department of the university of sao paulo-sibi/usp, brazil. electronic library for scientific journals i krzyzanowski andtaruhn 61 ! reproduced with permission of the copyright owner. further reproduction prohibited without permission. physical figure 1. infrastructure resources for consortium formation line also savs that one of the reasons for the growth in the number o'f electronic publications is "that it is technically possible to make them [journals] accessible in this way, and in fact easy and cheap, since nearly all te_xt ~oes through a digital version on the way to pubh~ahon. secondly, journal publishers believe that electronic ve~sions provide a second market in addition to that for t~eir printed versions, or at least in an expanded market, since many users will be the same." 4 . . . . . it is important to point out that the sc1enhhc penod1cal, be it paper or electronic, must ensure market valu_e and academic community receptivity, have a staff qualified for scientific publishing, be consistent in publishing release dates, comply with international standards, and use established distribution and sales mechanisms. 5 line goes further: "electronic publication as an_ 'extra' to printed publication has few added costs of j~urnal publication other than those of printing, and pubhshe~s are not going to want to make less money fro~ elect~onic journals than they do from printed ones. while p~inted journals once acquired can be used and reused without extra cost, each access to an electronic article has to be paid for. and although the costs of storage and binding may be saved, these are offset by the costs of printing out."6 he then notes that this technology demands an active equipment and telecommunication infrastructure. another point he addresses is the need for users to master the search strategies required to efficiently recover information, thus reducing the time spent and costs. in turn, saunders points out that, depending on the contracts made with the publishers or their agents: 62 information technology and libraries i june 2000 libraries, through their development, formation, and maintenance policies, should be receptive to this transition by accommodating the different means of communication to the different user needs and striving for a new balance. these policies should certainly stress the cooperation and sharing of remote access to the information demanded. budget estimates should, therefore, foresee, in addition to the subscriptions to electronic titles with complete texts, other possible items like licensing rates for multi-user remote access and the right to copy articles on electronic media to paper, depending on the contracts made with the publishers or their agents.7 i electronic publication consortiums catering to mutual interests by setting up a library consortium to select, acquire, maintain, and preserve electronic information is one means of reducing or sharing costs as well as expanding the universe of information available to users and ensuring a successful outcome. resources-physical, human, financial, and electronic-are combined for the common good; in this case, the consortium, as shown in figure 1, which was extracted and adapted from an oclc institute. 8 the consortium presupposes invigoration of cooperative activities among member libraries by promoting the central administration of electronic publication databases as part of a shared library system visible to all and replete with access facilities. in addition to putting in place simplified, reciprocal lending progra~s and spu_rring _the cooperative development of collections and the~r st~nng, the consortium has the objective of implementing information distribution by electronic means, provided that copyright and fair use rights are complied _wi~h.9 on t~e other hand, "the research library community is committed to working with publishers and database producers to develop model agreements that deploy lice~ses that d? not contract around fair use or other copynght provisions. in this way, one seeks to insure the library practices being disseminated, especially interli?~ary lendi~g."'. 0 experience shows that acqumng ~ubhcahons through consortia has brought great benefits and has equally favored different size institutions that would not be able to afford single subscriptions, whether on paper or in electronic format. north american and european universities have been opting for this type of alliance to augment inve~tment cost-benefit. important examples of these consortia currently operative are: • washington research library consortium, washington, d.c., www.wric.org; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • university system of georgia, galileo project, www.galileo.peachnet.edu; • committee on institutional cooperation, michigan, www.cedar.cic.net/ cic; and • ohio library and information network, ohio link, www.ohiolink.edu. i the electronic consortium in the state of sao paulo considering that brazilian institutions also are being affected by the high cost of maintaining periodical collections and that alternative means of distributing this information are available, the model used abroad has shown itself as appropriate for developing the international scientific publications electronic library in the state of sao paulo. the location has a favorable information infrastructure available, particularly that of the electronic network of the academic network of sao paulo (ansp), thanks to the support of the research support foundation of the state of sao paulo (fapesp). 11 growing user demand for direct, convenient access to information in the state of sao paulo also was a factor in location choice. the final decision was to compose the consortium of five sao paulo state universitiesuniversidade de sao paulo (usp), universidade estadual paulista (unesp), universidade de campinas (unicamp), universidade federal de sao carlos (ufscar), and universidade federal de sao paulo (unifesp)-as well as the latin american and caribbean center for health science information (bireme). the consortium's goal was to make available to the member institutions' entire scientific community-10,492 faculty and researchers -rapid access to the complete, updated texts of the elsevier science scientific journals. this publishing house, an umbrella for north holland, pergamon press, butterworth-einemann, and excerpta medica, presently publishes electronic versions of its journals. selection of the member institutions that would serve as a pilot group for this project was based on prior experience with the cooperative work in preparing the unibibli collective catalog cd-rom, which, using bireme/opas/oms technology, consolidates the collections of these three universities. the project was initially funded by the fapesp; since its fourth edition the cdrom has been published through funds provided by the universities themselves, by means of a signed agreement. moreover, choice of elsevier science, which would be justified solely by its premier ranking in the global publishing market, also is due to the fact that consortium member institutions maintain subscriptions to a great number (606) of this publishing house's titles on paper. already fully available on electronic media, these titles are components of a representative collection initiating the building of the international scientific publications electronic library in the state of sao paulo. furthermore, the majority of the titles are studied on the institute of scientific information's web of science site, which has been at the disposal of researchers and libraries in the state of sao paulo since 1998. consortium objectives the consortium was formed to contribute to the development of research through the acquisition of electronic publications for the state of sao paulo's scientific community. using the ansp network, in addition to augmenting and speeding up access to current scientific information in all the member institutions, will: • increase the cost-benefit per subscription; • promote the rational use of funds; • ensure continuous subscription to these periodicals; • increase the universe of publications available to users through collection sharing; • guarantee local storage of the information acquired and thus ensure the collection's maintenance and its continual use by present and future researchers; and • develop the technical capabilities of the personnel of the state of sao paulo institutions in operating and using electronic publication databases. initially, the project will not interfere in the current process of acquiring periodicals on paper and in distributing collections in member institutions. however, as electronic collection utilization becomes predominant, duplicate subscriptions to paper may be eliminated so as to allow new subscriptions to be available to the consortium at no additional cost. implementation of the electronic library for international scientific publications implementation of this project includes the following stages already achieved: • constitution of the consortium by the six member institutions; and • set up of an administrative board. the following stages are in progress: • purchase of hardware (central server) and a software manager; and • estimate for the installation of the operational system. electronic library for scientific journals i krzyzanowski and taruhn 63 reproduced with permission of the copyright owner. further reproduction prohibited without permission. bireme server fapesp server full-text database r----------.,1 full-text 1 t international i r database 1 ~----------~ web of science .... •--•.. : scientific : : current : : contents : scielo : periodical : 1 electronic 1 i l'b i 1 1 rary 1 .. __________ .. : connect : i (ccc) i i i ., __________ ., \/ universe • web of science: 8,000 titles • ccc: 9,000 titles users in consortia institutions • scielo (scientific electronic library online): 100 titles • international scientific periodical electronic library: 606 titles figure 2. reference database and full-text interconnectivity to optimize information access and the following stages are planned: • training for qualified personnel and maintenance of the infrastructure built up; • acquisition and implementation of the electronic library on the central server; and • permanent utilization assessment. the pilot project proposes that the central server, for storage and availability of electronic scientific periodical collections on the ansp network, be located at fapesp in order to facilitate development of an electronic bank. in the future, the bank should, in addition to the collection in mind for the project, include international collections of other publishing houses: the scielo collection of brazilian scientific magazines (project fapesp /bireme) as well as the web of science and current contents connect reference databases (see figure 2). consortium management the electronic library will be administrated by the consortium's administrative board, made up of a general coordinator, an operations coordinator, and directors and coordinators of the library systems and central libraries of member institutions as well as consultants recommended by fapesp. the administrative board shall be in charge of the implementation, operation, dissemination, and assessment of electronic library utilization. it also is charged 64 information technology and libraries i june 2000 with supervising qualified personnel training in order to guarantee the success of the project. an agreement specifying the consortium objective, its constitution, the manner by which it shall be executed and consortium member obligations established was signed. shortly, a contract to use elsevier science electronic publications shall be signed by fapesp and by the provider. the agreement's documents and use license were drawn up in compliance with the principles for licensing electronic resources recommended by the american library association, published in final version at the 1997 american library association annual conference.12 i recovery system and information use evaluation research on electronic media suggests that use of a single software program that offers different strategies and forms of interacting for searching the collections requires an evaluation of the efficiency of individual research strategies. this evaluation is critical for preparation of guidelines that orient the choice of systems and proper training programs.13 for the electronic library, the challenge of measuring not only the amount of file use but also the efficacy and efficiency of its information access systems and training for its users is an imperative task. in the project reproduced with permission of the copyright owner. further reproduction prohibited without permission. described, evaluation shall be made by indicators that demonstrate use of the electronic library and of the collections on paper, per journal title, subject researched, user institution, number of accesses per day, and user satisfaction regarding service provided (interface, response time, text copies), among other factors to be studied. i final remarks the way in which electronic media are read by the users is a code far beyond the written, because sound and image are being added increasingly. in this first generation of electronic publications, fapesp supported availability of web of science and of scielo by fapesp and the creation of the international scientific publications electronic library in the state of sao paulo. the possible introduction of current contents connect will trigger an extraordinary leap in research development, facilitating the access of scientific information and the acquisition and transmission of human knowledge as well as enhancing the cooperative and sharing enterprise of member libraries. references and notes l. annual report of the board of regents of tile smit/zsonum institution ... during the year 1851 (washington, d.c. 1852), 22. 2. leo wieers, "a vision of the library of the future," in developing the library of the fut11re: the tilb11rg experience, h. geleijnse and c. grootaers, eds. (tilburg, the netherlands: tilburg univ., 1994), 1-11. 3. m. b. line, "the case for retaining printed lis journals," !fla journal 24, no. 1 (oct./nov. 1998): 15-19. 4. ibid. 5. r. f. krzyzanowski, "administra<;ao de revistas cientificas," in re11niiio anual da sociedade de pesquisa odonto/6gica, aguas de sao pedro, 14, 1997. (lecture) 6. line, "the case for retaining printed lis journals." 7. l. m. saunders, "transforming acquisitions to support virtual libraries," information teclmology and libraries 14, no. 1 (mar. 1995): 41-46. 8. oclc institute, oclc instit11te seminar: information tec/znology trends for thl' global library cormmmity, 1997, ohio (dublin, ohio: oclc institute/the andrew w. mellon foundation/funda<;ao gettilio vargas/bibliodata library network, 1997). 9. a definition of fair use is the "legal use of information: permission to reproduce texts for the purposes of teaching, study, commentary or other specific social purposes." found in j. s. d. o'connor, "intellectual property: an association of research libraries statement of principles." accessed july 28, 1999, http://arl.cni.org/ scomm/ copyright/ principles. html. 10. statement of current perspective and preferred practices for the selection and purchase of electronic information. icolc statement on electronic information. accessed july 2, 1998, www.library.yale.edu/ consortia/statement.html. 11. r. f. krzyzanowski and others, biblioteca eletr6nica de publicac;oes cientfficas internacionais para as universidades e institutos de pesquisa do estado de sao paulo. sao paulo, 1998 (project presented to fapesp-fundac;ao de amparo a pesquisa do estado de sao paulo). 12. b. e. c. schottlaender, "the development of national principles to guide librarians in licensing electronic resources," library acquisitions-practice and theory 22, no. 1 (spring 1998): 49-54. 13. w. s. lang and m. grigsby, "statistics for measuring the efficiency of electronic information retrieval," journal of the american society for information science 47, no. 2 (feb. 1996): 159-66. electronic library for scientific journals i krzyzanowski and taruhn 65 using open access institutional repositories to save the student symposium during the covid-19 pandemic article using open access institutional repositories to save the student symposium during the covid-19 pandemic allison symulevich and mark hamilton information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14175 allison symulevich (asymulev@usf.edu) is scholarly communications librarian, university of south florida. mark hamilton (hamiltonma@longwood.edu) is research and digital services librarian, longwood university. © 2022. abstract in 2020, during the covid-19 pandemic, colleges and universities around the world were forced to close or move to online instruction. many institutions host yearly student research symposiums. this article describes how two universities used their institutional repositories to adapt their student research symposiums to virtual events in a matter of weeks. both universities use the bepress digital commons platform for their institutional repositories. even though the two universities’ symposium strategies differed, some commonalities emerged, particularly with regard to learning the best practices to highlight student work and support their universities’ efforts to host research symposiums virtually. introduction many colleges and universities host student research symposiums as a way to celebrate students’ intellectual experiences and support the high-impact practice of presenting original student research. students contribute research outputs and share their projects with others in their institution’s community, beyond the classroom. typically, many of these student research symposiums are conducted in the second half of the spring semester in order to allow students to work on their research throughout the course of the year. during the 2020 school year, the world experienced the covid-19 pandemic. the many ways this pandemic has changed our society are only now being understood, but the pervasive move to virtual meetings and presentations is certainly one of the most dramatic. college campuses began delivering remote instruction in a matter of days and organizers of student research symposiums around the country were forced either to cancel or reimagine the events. longwood university and university of south florida st. petersburg campus (usf) were two institutions that transformed their in-person student symposiums into online events in a matter of weeks. in this article, the authors share their experiences of working with many people throughout their campuses to create a student research symposium experience similar to their past in-person events. both universities use bepress’ digital commons platform for their institutional repositories. overall, longwood’s and usf’s symposium strategies were different in some regards, but some commonalities emerged, particularly with regard to learning the best practices that celebrate the students’ achievements and support their universities’ efforts promoting high-impact student research. literature review student research has grown in importance. following george kuh’s 2008 report, high-impact educational practices: what they are, who has access to them, and why they matter, universities recognized and responded to the need to integrate these high-impact practices into their curricular and co-curricular efforts.1 one of the recognized high-impact practices is student mailto:asymulev@usf.edu mailto:hamiltonma@longwood.edu information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 2 research.2 students can contribute to their disciplinary scholarly conversation through their original research and by presenting on their research projects, and colleges and universities can promote this conversation by facilitating the display of student work and enabling interactive discussions between the student presenters and other members of their academic community. the number of student research conferences has increased internationally. 3 students participate in the formal aspect of these conferences, as well as informal conversations where they can continue to expound on their research, extend their professional social networks, and gain confidence as researchers.4 student research is also being captured in institutional repositories (irs) more than in the past.5 “these most junior members of the academic community are doing research and adding to the body of knowledge generated by their institutions and in their disciplines.”6 passehl-stoddart and monge point out the importance of institutional repositories supporting student work: “the ir also serves to support, enhance, and capture evidence of high-impact educational practices; acts as an equitable access point to meaningful learning opportunities; and provides a platform for students to begin to develop academic confidence and an entryway into the scholarly communication learning cycle.”7 in supporting high-impact student research, librarians do not act alone. we collaborate with other departments on campus such as offices of undergraduate and graduate studies; offices of research, honor colleges, student affairs; and more. krause, eickholt, and otto describe how the library collaborated with the music department at eastern washington university to upload student musical performances to the institutional repository.8 this type of collaboration leads to increased student support, as well as increased discoverability of student intellectual and creative scholarship. when the covid-19 pandemic hit, universities around the world were forced to change their means of conducting business. classes were moved online at many institutions. conferences were either canceled or moved online as well. many colleges and universities around the coun try host student research symposiums to highlight the high-impact work that students are doing. these symposiums needed to move to remote delivery, and many of these had to move quickly as the spring semester was well underway when institutions were being forced to close. symposiums and conferences adapted to online environments by moving away from their inperson events. this applied to both academic and professional conferences. for example, oregon state university (osu) and the new haven local section of the american chemical society hosted their respective events virtually using a variety of technologies. osu worked with its distance learning unit to create a canvas course, whereas the new haven local section of the american chemical society used a combination of open broadcaster software (obs studio), youtube, zoom, and google drive. as far as professional conferences, many used prerecorded sessions when hosting on digital platforms such as zoom.9 there were positive outcomes from these virtual symposiums. for example, osu saw benefits of “enhanced ability to devote personalized attention to presenters (e.g., by providing links to relevant publications or websites), fewer distractions, more time to craft thoughtful responses, and an ability for students to keep track of shared resources and discussants’ contact information that could be used for follow-up after the event.” their post-event surveys also showed that students who could not previously participate due to distance circumstances were able to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 3 participate in an online forum. osu’s approach involved using canvas, their learning management system (lms), through which students submitted prerecorded lightning talks over powerpoint slides with a written narrative. the canvas course was open to the osu community. discussion boards for commenting were open for a two-day period.10 in her article, stephanie houston interviewed various conference coordinators.11 interviewees stated that a major benefit was global access to information from top researchers.12 with regard to cancer research conferences, free registration vastly expanded the number of registrants from previous years.13 conference hosts felt as though some of the differences of online events would stay for future years because of personal scheduling issues, ability to provide global access, and environmental impact.14 others think the novelty of virtual events may wear off following the pandemic.15 however, the switch to virtual events was not without challenges. osu noted that two main challenges they faced were organization of presentations and presenters responding to comments on their asynchronous presentations.16 houston’s interviewees explain that the lack of informal discussions and face-to-face interactions was a negative of hosting virtual symposiums.17 speirs also states that virtual poster sessions suffer from the lack of interaction of face-to-face exchanges, especially for young researchers.18 some saw that the large number of participants made it difficult for participants to engage in question-and-answer sessions.19 two of the interviewees attempted to fix this by using twitter to have asynchronous q&a using a specific hashtag for the event.20 technology issues such as limited bandwidth and internet connectivity problems are a concern for virtual conferences.21 conferences that are not archived can result in a loss of material beyond the original event. jonathan bull and stephanie davis-kahl discuss the problem of conference ephemera not being accessible in their poster presentation.22 they explain that conference-hosting as an institutional repository service can assist with this lack of accessibility. “by posting documents and artifacts from conferences within an institutional repository, the content is not only accessible for future use, but also preserves those materials for the future and for institutional memory.”23 virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry longwood university is a public university in south central virginia. it has about 4,000 undergraduate and 500 graduate students. it is known for its liberal arts focus, with strong programs in the humanities, nursing, and education. since spring of 2018 there has been a spring symposium of undergraduate research (now called the spring showcase for research and creative inquiry). for the first three years of its existence, the spring showcase was planned as a single day event in april, then a fall showcase was added in november 2019. in january 2020, the university showcase committee began planning to have an in-person spring showcase for research and creative inquiry on april 22, 2020. the proposed schedule was to be as follows: students would register to be part of it by march 13, 2020; they would be notified of their acceptance by the end of march, and they would be encouraged to submit posters to the institutional repository, digital commons @ longwood, by the date of the spring showcase, april 22. planning for the in-person showcase continued throughout february and the first part of march. one of the elements on the registration form was giving permission for student content to information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 4 go into the institutional repository. this step had been added in fall 2019 for the previous showcase. as covid-19 cases in the united states began rising in the beginning of march, administrators at longwood began to discuss the possibility of altering certain events. author mark hamilton considered the possibility of offering the institutional repository as a vehicle for hosting digital content for the showcase. by march 23, the director of undergraduate research notified the author that a decision had been made to host the spring showcase as an asynchronous event from april 22–24, 2020. this decision had been made by a small group including the co-chairs of the showcase and the provost in consultation with others. the director of undergraduate research specifically asked the library if the event could be hosted virtually through digital commons @ longwood and also requested a comments feature to facilitate online conversation. students, faculty, and staff would comment on the presentations throughout the three-day showcase. presenters would check the comments within those three days and post replies. the director also asked if it would be possible to upload videos that could go along with posters and other presentations. hamilton and the library’s digital initiatives specialist began to work through the technical aspects of making the showcase virtual. after inquiring about potential software from digital commons, they looked through two suggested options, disqus (https://disqus.com/) and intense debate (https://intensedebate.com/). they decided on intense debate because the comments feature was already integrated into the platform. they also looked through the various video formats available. then they worked with the digital commons representative to develop the structure for the showcase. this involved a bit of dialogue back and forth between the showcase co-chairs, digital commons staff, and the library. because the registration form already gave permission to post student content, it was decided that the university did not need to ask for this permission a second time for a virtual conference. new workflows were developed for the research submission process which included posters, presentations, and videos. faculty would submit files on behalf of students in their classes. library staff and instructional designers developed video and printed upload instructions. they were posted on the showcase website (part of the university website) as well as advertised via the library website. faculty asked students to submit their final projects by thursday, april 16, so there was time to upload the project posters and videos to digital commons @ longwood. most students submitted their projects through the campus lms, canvas. as faculty attempted to upload content, library staff were available to help them. the author helped one faculty member via zoom, describing the process of uploading to digital commons. one process that had to be adjusted involved powerpoint presentations that contained videos—they had to be separately downloaded and then just the powerpoint of the poster re-uploaded, so visitors could both view the powerpoint and watch the video. hamilton and the other staff member worked with faculty to place all the required content for each presentation; then the library’s digital initiatives specialist made all the content live. a number of activities occurred during the showcase. initially the digital initiatives specialist individually approved the comments that were posted, because this was the default set up. later this was changed to allow for automatic posting to speed up the approval process and to remove any apparent bias on the part of the administrators. some faculty also uploaded a few new versions of presentations. some of the science students decided to post only their abstracts, https://disqus.com/ https://intensedebate.com/ information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 5 because they were going to publish their research in journals and did not want their content to be open access and because some faculty were co-authors in these publications. in the subject listing, library staff included a link to the live zoom presentations that were offered. there was also a link to a live zoom session for the showcase awards ceremony, highlighting submissions to the journal of the college of arts and sciences. longwood university has hosted two more virtual showcases: in fall 2020 and spring 2021. the showcase organizers chose to switch the hosting platform to symposium by forager one (https://symposium.foragerone.com/), a third-party platform that allows for virtual and live posting of presentations and videos. the new platform provided an easier interface for students to submit research and administrators to manage it. library staff worked with the showcase organizing committee to preserve all the abstracts from the spring showcase. they are in discussions with how future content will be preserved and whether library staff should collect some of the research into the institutional repository. university of south florida st. petersburg campus virtual student research symposium the university of south florida st. petersburg campus is a branch campus of the larger university of south florida (usf). usf is an r1 research institution with approximately 50,000 undergraduate and graduate students in tampa, florida. at the time of the 2020 virtual student research symposium, usf st. petersburg campus was a separately accredited institution with roughly 5,000 undergraduate and graduate students. the student research symposium was in its 17th iteration in 2020. the office of research at usf st. petersburg organized the event and coordinated with the nelson poynter memorial library and the honors program. undergraduate and graduate students were invited to share their work with the campus community to demonstrate the high-impact research that they were conducting. in 2019, the library had worked with the office of research to host award winners and posters nominated by faculty on the bepress digital commons institutional repository called usfsp digital archive. for the 2020 symposium, the office of research began planning in august 2019. the inperson symposium was scheduled for april 16, 2020. when the covid-19 pandemic hit, the usf st. petersburg campus moved to remote instruction and ended on-campus activities on march 20, 2020. the office of research staff contacted librarians at nelson poynter memorial library to discuss the possibility of a virtual symposium. author allison symulevich considered a variety of platforms for hosting the research symposium, such as the campus website, canvas, libguides, digital commons, and facebook. criteria for platforms included factors related to team control, security, engagement, and archiving. becaus e of these factors and the prior pilot project, the library decided to recommend that the institutional repository be used to host the virtual research symposium. the office of research wanted to capture an experience for the students similar to that of the inperson event. thus, they requested that the platform include both video and audio options, as well as a way for the poster to be viewed. the office of research also requested audience participation through a commenting feature if possible. they also extended student submission deadlines to assist with the disruption in students’ lives. the office of research used a course in canvas, the learning management system used at usf, to collect research posters and presentations from the students. the library digital team was given https://symposium.foragerone.com/' information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 6 access to the canvas course so that the team could download posters, presentations, and abstracts to then upload to the institutional repository. the library uploaded 55 student projects, 43 of which had a video or audio presentation. the digital team had hoped to batch upload the files to the institutional repository using spreadsheets containing metadata such as author names, titles, abstracts, and links to audio or video presentation files. however, due to technical concerns, everything was uploaded manually, with work divided amongst team members who had previously used the system. first, all of the content was downloaded from canvas. these files were then posted to a shared drive in a variety of folders organized to maintain a workflow. the projects with audio/video presentations were uploaded first. then the projects that had abstracts and posters were added. due to time constraints, the digital team wanted to make sure the basics were done first so students would have time to make any revisions necessary before the site was promoted to the usf st. petersburg community. after this initial implementation, the team had a meeting with the larger committee to discuss the progress of the digital collection. the committee suggested some changes and offered constructive feedback. once the abstracts, posters, and presentations (either audio or video) were posted, the team noticed issues with some submissions. some students had submitted powerpoint presentations that did not display as the team was hoping, so one of the team members changed the format to mp4 files. audio files did not include a visual component. as a way to add a visual component, the team worked with digital commons to create a digital image gallery and add thumbnail images that could then be added to a special metadata field called poster preview. this enabled the collection to have a visual of the poster displayed above the audio file, allowing virtual attendees to press play on the audio file and see the poster image on the same page. the team then turned to the office of research’s request for a feature that allowed virtual attendees to interact with student presenters. digital commons does not have a commenting feature, so the digital team had to look at third-party commenting platforms. digital commons was able to integrate the platform chosen, intense debate, so that virtual attendees could comment on presentations. students were asked to monitor their posters for a two-week period. moving forward, the library and the usf st. petersburg campus discussed using the institutional repository for the spring 2021 symposium. however, due to administrative consolidation of the usf tampa, st. petersburg, and sarasota-manatee campuses into a oneusf with a single accreditation, the new combined office of research on the tampa campus decided to host the newly expanded, one-university, undergraduate student symposium through a canvas course.24 download statistics following the research symposiums, the authors looked at metrics for the virtual events. at usf, 55 presentations were uploaded to the ir. all time downloads from april 1, 2020 to december 31, 2021, including additional supplementary files, is 2,068, from 53 countries around the world. total streams of audio or video presentations for the same timeframe are 1,168. at longwood, 200 presentations were uploaded to the ir. all time downloads from april 13, 2020 to december 31, 2021, including additional supplementary files, is 16,190 from 124 countries around the world. total streams of video presentations for the same timeframe are 2,541. see figures 1 and 2. these presentations are still getting downloads and streams—one of the benefits of preserving high-impact student research projects. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 7 university of south florida longwood university figure 1. downloads of symposium materials from each campus from april 20, 2020 to december 31, 2021. blue represents downloads of presentations. red represents supplementary materials. university of south florida longwood university figure 2. streams of symposium materials from each campus from april 20, 2020 to december 31, 2021. dark blue represents plays, blue represents views, and light blue represents completed viewings. information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 8 best practices after reflecting on these large undertakings to move in-person events to online student research symposiums, the authors have identified some common best practices, meant to assist other institutions making similar decisions. these decisions are based on the following core requisites. consistent university branding although both universities used bepress’ digital commons platform, institutions can use a variety of online platforms, such as campus websites, other institutional repositories, and third-party software for conference hosting, such as symposium by forager one or lumen learning. use a system that creates a cohesive look and feel to the collection of student research projects. usf had to do this with audio only presentations for consistency of viewing, adding a visual co mponent to match those of video presentations. university access or open access use a platform that allows archiving of student projects. even if the platform chosen for hosting the event does not allow for archiving, libraries should work with event hosts to provide institutional repository digital archiving of projects, similar to usf’s pilot project and longwood’s 2021 spring project of archiving abstracts. libraries can offer this as a solution to provide permanent archiving of high-impact student work.25 institutions need to consider whether they will keep their symposiums closed, meaning only accessible to the university community, or open to the world. while it is technically straightforward to restrict access using the campus lms, irs using net id sign-ins, or private websites, the authors argue for worldwide access to these presentations. archiving student work archiving these projects allows students to build their cvs for graduate school or interviews by providing hyperlinked citations to worldwide published projects. making these projects available open access allows students to contribute to the worldwide scholarly conversation on their given research topics.26 statistics from both longwood and usf show international downloads. file formats as far as file formats that work best, consider embedding video and audio files using consistent formats. mp4 video files worked best for both longwood and usf on the digital commons platform. audio files should be consistent as well for preservation. cross-unit collaborations work with other departments on campus to host these major academic events. many units on campus contribute to student success, and these efforts can be combined to distribute work amongst university faculty and staff so as not to overload one department and to provide the best possible symposium. different departments have different skill sets, such as technology and marketing. both longwood and usf st. petersburg libraries worked with departments such as undergraduate studies and communications to switch these in-person events to successful online programs. consider working with distance learning units to increase distance learning student participation in student research symposiums.27 distance learning and it departments may have additional technology experience that could lead to a better overall experience for students. in 2021, longwood worked with the office of student research, the library, marketing and communications, and academic affairs to put on the spring student showcase. this inter-unit information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 9 work led to another successful online event, with several hundred student researchers presenting their work. flexibility institutions should use flexible workflows when transitioning in-person events to online. both longwood and usf used flexible workflows for posting presentations into institutional repositories. however, the two universities differed in their submission process. longwood had faculty submit student projects directly to the ir, a more distributed approach. usf st. petersburg had students submit projects to canvas and then the digital team posted projects to the ir, a more centralized approach. institutions will need to decide which approach works best for them. usf st. petersburg does not have a history of allowing outside submissions to its ir. the digital teams needed to remain flexible as event dates were moving and online technology requests were changing, for example, event coordinators requesting online commenting features. similarly, deadlines should be set with realistic timeframes, allowing enough time for uploading projects to the online platform. longwood and usf worked with offices of research to establish flexible timelines for digital teams. submission forms consider using forms or a system to collect student submissions. google forms, microsoft forms, or a learning management system such as canvas are ways to collect the projects. make sure to test these prior to the submission process. both universities used canvas during the 2020 student research symposiums to collect student projects. however, in 2021, longwood students (both graduate and undergraduate) submitted directly to forager one’s symposium platform because it was already integrated into the campus single sign-on service, enabling ease of submissions. for the graduate student research symposium in 2021, usf used microsoft forms to create a form that was tailored to file format preference. although this form was not used after the office of graduate studies went in another direction, symulevich felt it was an improvement from the previous year’s collection process due to the output of an excel spreadsheet for metadata collection for batch uploading purposes. abstract archiving institutions should consider allowing students to submit abstracts only. longwood allowed students to not submit complete presentations if they were planning to publish their projects. this may be more of an issue when students are working with faculty members on research to be published at a later date. promoting the symposium and creating engagement promote the event to increase student participation. this can be done both through social media and through university web presences. consider working with your campus marketing and communications department to broaden marketing beyond the library. this marketing can be both to gain student projects and to promote the event to the broader campus community. likewise, seek ways to promote engagement on the institutional repository or whatever platform is chosen. this could include using a third-party commenting feature as a way to further engage students with their scholarly topic. however, make sure to monitor commenting in some capacity to avoid spam. also, turn off commenting features after a certain period of time so as not to overburden students. commenting features and increased engagement via online platforms, like video and audio presentations, help avoid the negative impact of a lack of face-to-face interactions.28 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 10 hybrid symposiums even after the end of the covid pandemic when events resume in-person, hybrid symposium models should be considered, as evidenced by longwood’s use of synchronous presentations using zoom. these links were integrated into the ir. osu is considering using hybrid solutions in the future as well.29 conclusion moving in-person student research symposiums to online platforms during a pandemic is challenging. but this process of creating online events allows students to continue to celebrate their highimpact research and contribute to the scholarly community. open access archiving of these projects has been successful based on download counts at longwood university and usf st. petersburg campus. the authors hope to continue to use innovative digital archiving to provide support for student research projects. remaining flexible and working with other departments on campus can lead to successful online events. the authors hope in-person events will eventually return; however, these online platforms can enhance student research symposiums, providing global access to high-impact student projects. acknowledgement the authors thank the collaborative teams at longwood university and university of south florida st. petersburg campus that helped make these student research symposiums happen and succeed during a very difficult time. endnotes 1 george d. kuh, “high-impact educational practices: what they are, who has access to them, and why they matter,” leap (vol. 2008). association of american colleges & universities, https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf. 2 kuh, “high-impact.” 3 helen walkington, jennifer hill, and pauline e. kneale, “reciprocal elucidation: a student-led pedagogy in multidisciplinary undergraduate research conferences,” higher education research & development 36, no. 2 (2017): 417, https://doi.org/10.1080/07294360.2016.1208155. 4 walkington, hill, and kneale, “reciprocal elucidation,” 417–18. 5 danielle barandiaran, betty rozum, and becky thoms, “focusing on student research in the institutional repository: digitalcommons@usu,” college & research libraries news 75, no. 10 (2014): 546–49, https://doi.org/10.5860/crln.75.10.9209; betty rozum, becky thoms, scott bates, and danielle barandiaran, “we have only scratched the surface: the role of student research in institutional repositories” (paper, acrl 2015 conference, portland, or, march 26, 2015), https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/rozum_thoms_bates_barandiaran.pdf. 6 rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. https://provost.tufts.edu/celt/files/high-impact-ed-practices1.pdf https://doi.org/10.1080/07294360.2016.1208155 https://doi.org/10.5860/crln.75.10.9209 https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/rozum_thoms_bates_barandiaran.pdf information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 11 7 erin passehl-stoddardt and robert monge, “from freshman to graduate: making the case for student-centric institutional repositories,” journal of librarianship and scholarly communication 2, no. 3 (2014): 2, https://doi.org/10.7710/2162-3309.1130. 8 rose sliger krause, andrea langhurst eickholt, and justin l. otto, “creative collaboration: student creative works in the institutional repository,” digital library perspectives 34, no. 1 (2018): 20–31, https://doi.org/10.1108/dlp-03-2017-0010. 9 jessica g. freeze et al., “orchestrating a highly interactive virtual student research symposium,” journal of chemical education 97, no. 9 (2020): 2773–78, https://dx.doi.org/10.1021/acs.jchemed.0c00676; sophie pierszalowski et al., “developing a virtual undergraduate research symposium in response to covid-19 disruptions: building a canvas-based shared platform and pondering lessons learned,” scholarship and practice of undergraduate research 4, no. 1 (fall 2020): 75, https://doi.org/10.18833/spur/4/1/10. 10 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 11 stephanie houston, “lessons of covid-19: virtual conferences,” journal of experimental medicine 217, no. 9 (2020): e20201467, https://doi.org/10.1084/jem.20201467. 12 houston, “lessons of covid-19,” 2. 13 valerie speirs, “reflections on the upsurge of virtual cancer conferences during the covid -19 pandemic,” british journal of cancer 123 (2020): 698–99, https://doi.org/10.1038/s41416020-1000-x. 14 houston, “lessons of covid-19,” 3. 15 speirs, “reflections on the upsurge,” 699. 16 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 17 houston, “lessons of covid-19,” 2–3; goedele roos et al., “online conferences—towards a new (virtual) reality,” computational and theoretical chemistry 1189 (november 2020): 5, https://doi.org/10.1016/j.comptc.2020.112975. 18 speirs, “reflections on the upsurge,” 699. 19 houston, “lessons of covid-19,” 2–3. 20 houston, “lessons of covid-19,” 2. 21 houston, “lessons of covid-19,” 3; roos et al., “online conferences,” 5; speirs, “reflections on the upsurge,” 699. 22 jonathan bull and stephanie davis-kahl, “contributions to the scholarly record: conferences & symposia in the repository,” library faculty presentations (2015): paper 12, http://scholar.valpo.edu/ccls_fac_presentations/12. 23 bull and davis-kahl, “contributions to the scholarly record.” https://doi.org/10.7710/2162-3309.1130 https://doi.org/10.1108/dlp-03-2017-0010 https://dx.doi.org/10.1021/acs.jchemed.0c00676 https://doi.org/10.18833/spur/4/1/10 https://doi.org/10.1084/jem.20201467 https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1038/s41416-020-1000-x https://doi.org/10.1016/j.comptc.2020.112975 http://scholar.valpo.edu/ccls_fac_presentations/12 information technology and libraries march 2022 using open access institutional repositories to save the student symposium | symulevich and hamilton 12 24 digital commons @ usf will be used for a hybrid symposium, the 2022 annual undergraduate research conference. there will be an in-person component, as well as both synchronous and asynchronous presentations. 25 passehl-stoddardt and monge, “from freshman to graduate,” 2; barandiaran, rozum, and thoms, “focusing on student research in the institutional repository”; rozum, thoms, bates, and barandiaran, “we have only scratched the surface,” 804. 26 houston, “lessons of covid-19,” 2. 27 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. 28 houston, “lessons of covid-19,” 2–3; roos et al., “online conferences,” 3. 29 pierszalowski et al., “developing a virtual undergraduate research symposium,” 75. abstract introduction literature review virtual student research symposiums longwood university’s virtual spring showcase for research and creative inquiry university of south florida st. petersburg campus virtual student research symposium download statistics university of south florida longwood university university of south florida longwood university best practices consistent university branding university access or open access archiving student work file formats cross-unit collaborations flexibility submission forms abstract archiving promoting the symposium and creating engagement hybrid symposiums conclusion acknowledgement endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. is this a geolibrary? a case of the idaho geospatial data center jankowska, maria anna;jankowski, piotr information technology and libraries; mar 2000; 19, 1; proquest pg. 4 is this a geolibrary? a case of the idaho geospatial data center maria anna jankowska and piotr jankowski the article presents the idaho geospatial data center (igdc), a digital library of public-domain geographic data for the state of idaho. the design and implementation of igdc are introduced as part of the larger context of a geolibrary model. the article presents methodology and tools used to build igdc with the focus on a geolibrary map browser. the use of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i n the era of integrated transnational economies, demand for fast and easy access to information has become one of the great challenges faced by the traditional repositories of information-libraries. globalization and the growth of market-based economies have brought about, faster than ever before, acquisition and dissemination of data, and the increasing demand for open access to information, unrestricted by time and location. these demands are mobilizing libraries to adopt digital information technologies and create new methods of cataloging, storing, and disseminating information in digital formats. libraries encounter new challenges constantly. participation in the global information infrastructure requires them to support public demand for new information services, to help the society in the process of selfeducation, and to promote the internet as a tool for sharing information. these tasks are becoming easier to accomplish thanks to the growing number of digital libraries. since 1994, when the digital library initiative originated as part of the national information infrastructure program, the internet has accommodated many digital libraries with spatial data content. for example, the electronic environmental library project at the university of california, berkeley (http:/ /elib.cs. berkeley.edu/) provides botanical and geographic data; the university of michigan digital library teaching and learning project (www.si.umich.edu/umdl/) focuses on earth and space sciences; the carnegie mellon's informedia digital video library (www.informedia. cs.cmu.edu) distributes digital video, audio, and images maria anna jankowska (majanko@uidaho.edu) is associate network resources librarian, university of idaho library, and piotr jankowski (piotrj@uidaho.edu) is associate professor, department of geography, university of idaho, moscow, idaho. 4 information technology and libraries i march 2000 with text; and the alexandria digital library at santa barbara (http:/ /alexandria.sdc.ucsb.edu/) provides geographically referenced information. the alexandria digital library is of special interest in this article because it implements a model of a geolibrary. a geolibrary stores georeferenced information searchable by geographic location in addition to traditional searching methods such as by author, title, and subject. the purpose of this article is to present the idaho geospatial data center (igdc) in the larger context of a geolibrary model. igdc is a digital library of publicdomain geographic and statistical data for the state of idaho. the article discusses methodology and tools used to build igdc and contrast its capabilities with a geolibrary model. the usage of igdc is evaluated from the perspective of access and demand for geographic data. finally, the article offers recommendations for future development of geospatial data centers. i geographic information systems for public services terms such as digital, electronic, virtual, or image libraries have existed long enough to inspire diverse interpretations. the broad definition by covi and king concentrates on the main objective of digital libraries, which is the collection of electronic resources and services for the delivery of materials in different formats.1 the common motivation for initiatives leading to the development of digital libraries is to allow conventional libraries to move beyond their traditional roles of gathering, selecting, organizing, accessing, and preserving information. digital libraries provide new tools allowing their users not only to access the existing data but also to create new information. the creation of new information using the existing data sources is essential to the very idea of the digital library. since the information in a digital library exists in virtual form, it can be manipulated instantaneously by computer-based information processing tools. this is not possible using traditional information media (e.g., paper, microfilm) where the information must first be transferred from non-digital into digital format. since late 1994, when the u.s. national science foundation founded the alexandria digital library project, the number of internet sites devoted to spatially referenced information has grown dramatically. today, it would require a serious expenditure of time and effort to visit all geographic data sites created by state agencies, universities, and commercial organizations. in 1997 karl musser wrote, "there are now more than 140 sites featuring interactive maps, most of which have been created in the last two years." 2 this incredible boom in publishing reproduced with permission of the copyright owner. further reproduction prohibited without permission. spatial data is possible thanks to geographic information system (gis) technology and data development efforts brought about by the rapidly increasing use of gis. this new technology provides its users with capabilities to automate, search, query, manage, and analyze geographic data using the methods of spatial analysis supported by data visualization. traditionally, geographic data were presented on maps considered as public assets. according to a norwegian survey, the aggregate benefit accrued from using maps was three times the total cost of their production, even though maps provided only static information.3 today, the conventional distribution of geographic data on printed maps has become less efficient than distributing them in the digital format through wide area data networks. this happened largely due to gis's ability to separate data storage from data presentation. as a result, data can be presented in a dynamic way, according to users' needs. often gis is termed "data mixing system" because it can process data from different sources and formats such as vector-format maps with full topological and attribute information, digital images of scanned maps and photos, satellite data, video data, text data, tabular data, and databases. 4 all of these data types provide a rich informational infrastructure about locations and properties of entities and phenomena distributed in terrestrial and subterrestrial space. the definition of gis changes according to the discipline using it. gis can be used as a map-making machine, a 3-d visualization tool, and as an analytical, planning, collaboration, and business information management tool. today, it is hard to find a planning agency, city engineering department, or utility company (not to mention individual internet users) that has not used digital maps. this is why the number of users seeking spatial data in digital format has increased so dramatically. data discovery can be for gis users the most time-consuming part of using the technology. 5 as a result, libraries are faced with the growing demand for services that help discover, retrieve, and manipulate spatial data. the web greatly improved the availability and accessibility of spatial data but, at the same time, stimulated public interest in using geographic information. the continuing migration to popular operating systems (i.e., microsoft windows family) and the adoption of their common functionality has brought gis software to many desktops. tools such as arcview gis from environmental systems research institute, inc. (esri, www.esri.com) or maplnfo from maplnfo corporation (maplnfo, www.mapinfo.com) have become popular gis desktop systems. new software tools such as arcexplorer, released by esri, are focused on making gis more accessible, simpler, and available for use by the public. by taking advantage of the popularity of the web, attempts are being made to gain a wider acceptance of gis. in the wake of the simplification of gis tools and improved access to spatial data, a new exciting area of gis use has recently emerged-public participation gis.6 public participation gis by definition is a pluralistic, inclusive, and nondiscriminatory tool that focuses on the possibility of reducing the marginalization of societies by means of introducing geographic information operable on a local level.7 it promotes an understanding of spatial problems by those who are most likely to be affected by the implementation of problem solutions, and encourages transfer of control and knowledge to these parties. this approach leads to a broader use of gis tools and spatial data and creates new challenges for libraries storing and serving geographic data in digital formats. broadening the use of data and gis tools requires attention to data access. traditional libraries have often fulfilled the crucial role of being an impartial information provider for all parties involved in public decision-making processes. will they be capable of serving the society in this capacity in the digital age? i geolibrary as a repository of georeferenced information according to brandon plewe, the user of spatial data can choose among seven types of distributed geographic information services available on the intemet. 8 they range from raw data download, through static map display, metadata search, dynamic map browsing, data processing, web-based gis query and analysis, to net-savvy gis software. yet, another important new category of geographic data service that can be added to this list is geolibrary. goodchild defines a geolibrary as a library filled with georeferenced information where the primary basis of representation and retrieval are spatial footprints that determine the location by geographic coordinates. "the footprints can be precise, when they refer to areas with precise boundaries, or they can be fuzzy when the limits of the area are unclear." 9 according to buttenfield, "the value of a geolibrary is that catalogs and other indexing tools can be used to attach explicit locational information to implicit or fuzzy requests, and once accomplished, can provide links to specific books, maps, photographs, and other materials." 10 a geolibrary is distinguished from a traditional library in being fully electronic, with digital tools to access digital catalogs and indexes. it is anticipated that most of the information is archived in digital form. the value of a geolibrary is that it can be more than a traditional, physical library in electronic form.11 is this a geolibrary? i jankowska and jankowski 5 reproduced with permission of the copyright owner. further reproduction prohibited without permission. since its introduction, the concept of a geolibrary has been synonymous with the alexandria digital library (aol) project. once aol was defined as the internetbased archive providing comprehensive browsing and retrieval services for maps, images, and spatial information.12 a more recent definition characterizes aol as a geolibrary where a primary attribute of collection objects is their location on earth, represented by geographic footprints. a footprint is the latitude and longitude values that represent a point, a bounding box, a linear feature, or a complete polygonal boundary.13 according to goodchild (1998) a geolibrary' s components include: • the browser-a specialized software application running on the user's computer and providing access to geolibrary via a computer network. • the basemap-a geographic frame of reference for the browser's searches. a basemap provides the image of an area corresponding to the geographical extent of geolibrary collection. for the worldwide collection this would be the image of the earth. for the statewide collection this could be the image of a state. the basemap may be potentially large, in which case it is more advantageous to include it in the browser then to download it from a geolibrary server each time a geolibrary is accessed. • the gazetteer-the index that links place names to a map. the gazetteer allows geographic searches by place name instead of by area. • server catalogs-collection catalogs maintained on distributed computer servers. the servers can be accessed over a network with the browser, utilizing basic server-client architecture. the value of a geolibrary lies in providing open access to a multitude of information with geographic footprints regardless of the storage media. because all information in a digital library is stored using the same digital medium, traditional problems of physical storage, accessibility, portability, and concurrent use (e.g., many patrons wanting to view the one and only copy of a map) do not exist. i idaho geospatial data center in 1996, inspired by the aol project, a team of geographers, geologists, and librarians started to work on a digital library of public-domain geographic data for the state of idaho. the main goal of the project was the development of a geographic digital data repository accessible through a flexible browsing tool. the project 6 information technology and libraries i march 2000 was funded by a grant from the idaho board of education's technology incentive program. the project resulted in the creation of the idaho geospatial data center (igdc, http://geolibrary.uidaho.edu). the first in the state of idaho, this digital library is comprised of a database containing geospatial datasets, and geolibrary software that facilitates access, browsing, and retrieval of data in popular gis data formats including digital line graph (dlg), digital raster graphics (drg), usgs digital elevation model (dem), and u.s. bureau of census tiger boundary files for the state of idaho. the site also provides an interactive visual analysis of selected demographic/economic data for idaho counties. additionally, the site provides interactive links to other idaho and national spatial data repositories. the key component of the library is the geolibrary software. the name "geolibrary" is not synonymous with the model of geolibrary defined by goodchild (1998). it was rather adopted as a reference to a geolibrary browser-one of the components of the geolibrary. the geolibrary browser (gl) supports online retrieval of spatial information related to the state of idaho. it was implemented using microsoft visual basic 5.0/6.0 and esri mapobjects technology. the software allows users to query an area of interest using a search based on map selection, as well as selection by area name (based on uses 7.5-minute quad naming convention). queries return gis data including dems, dlgs, drgs, and tiger files. queries are intended both for professionals seeking gis-format data and nonprofessionals seeking topographic reference maps in the drg format. the interface of gl consists of three panels resembling the microsoft outlook user interface. our intent in designing the interface was to have panels that would be used in the following order. first, the map panel is used to explore the geographic coverage of the geolibrary and to select the area of interest. next, the query panel is used to execute a query, and finally the result panel allows the user to analyze results and to download spatial data. users can use a shortcut to go directly to the query panel and type their query. both approaches result in the output being displayed as the list of files available for download from participating servers. the map panel (figure 1) includes a navigable map of idaho, a vertical command toolbar, and a map finder tool. the command toolbar allows the user to zoom in, zoom out, pan the map, identify by name the entities visible on the map canvas, and select a geographic area of interest. geographic entity name identification was implemented as a dynamic feature whereby the name of entity changes as the user moves the mouse over the map. spatial selection provides a tool to select a rectangular area of interest directly on the map canvas. the map finder provides additional means to simplify the exploration of the map. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the results panel shows the outcome of the query and includes important information about the data files: their size, type , projection, scale , the name of the server providing the data, as well as the access path (figure 4). based on this information , the user has the option of manually connecting to the server, using ftp protocol, and retrieving th e selected files. a much more convenient approach, however, is to rely on gl software to automatically retrieve the files through the software int erface. as an option , the result of the query can also be exported to a plain html document that contains links to all listed files . this feature can be very useful in the case of multifile files selected by the user and slow or limited-time internet access. this way the user can open the saved list of files in a web browser and download individual files as needed, without having to download all the files at once and tie up the internet connection for a long period of time. figure 1. map panel. the vertical toolbar provides zooming, panning , as well as labeling and simple feature querying capabilities. the map finder allows finding and selecting an area by county or usgs quad name . the screen copy here presents the selection of latah county in idaho. the result panel provides a flexible way to review and organize the outcomes of queries before commencing the download. one can sort files by name, size, scale, the user can select a county or a quad name and zoom in on the selected geographic unit. the query panel (figure 2) allows the user to perform a query, based either on the selection made on the map or a new selection using one of the available query tools (figure 3). in the latter case, the user can enter geographic coordinates (in decimal degrees) defining the area of interest. this approach is equivalent to selecting a rectangular area directly on the map, and will return all data files that spatially intersect with the selected area. optionally, the user can handpick quads of interest from the list. finally, a name can be entered to execute a more flexible query . for instance, the search containing the word "moscow" returns spatial data related to three quads containing "moscow" within their names. the query is executed when the user presses the query button . after the results are received, the application automatically switches to the results panel. projection, and server name . this feature may be useful if the user decides to retrieve data of only one type (e.g., dems), of one scale, or when the user prefers to connect only to a specific sever. in addition, individual records as well as entire file types can be selected to prevent files from being downloaded. the user can also remove selected files to scale down the set of data in the list. one of the most important assets of the gl browser is that all of the user activities described up to this point, with the exception of file download, take place entirely on the client-side without any network traffic. in fact, area/file selection as well as queries do not require an active internet connection. map exploration is based on vector-format maps contained in gl software and queries are run against the local database. such an approach limits bandwidth consumption and unnecessary network traffic. internet connection is only necessary to perform retrieval of selected files. is this a geolibrary? i jankowska and jankowski 7 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 2. query panel. the interface was set to query spatial selection from the map panel. figure 3. query panel. the query is based on the selection of usgs quads . optionally, the user can enter geographic coordinates of the area or a text to search. 8 information technology and libraries i march 2000 the vulnerability of the client-side approach to data query is to be left with a potentially outdated local database. in order to prevent this problem from happening, the gl is equipped with a database synchronization mechanism that allows users to keep up with the server database updates. the client-side database, contained in gl software, which mirrors the schema of the server database, can be synchronized automatically or by the user's request. in either case, the gl client contacts the server-based database synchronizer on the server side and handles all necessary processes. since the synchronization is limited to database record updates, the network traffic is kept low, making gl suitable for limited internet connections. igdc is an open solution. new local datasets can be added or removed making the collection easily adaptable to different geographical areas. in addition, datasets can physically reside on multiple servers, taking full advantage of the internet's distributed nature. i evaluation of igdc use geospatial information is among the most common public information needs; almost 80 percent of all information is geographic in nature. published research reflecting those needs and the role of libraries in resolving them is not extensive. the efforts of federal, state, and local agencies collecting digital geospatial data and the growth of gis created an interest in the role of libraries as repositories of geospatial data. 14 the main obstacle to providing access to digital spatial information is its complexity. this is why the user-friendly interface is critical for presenting spatially referenced information.15 the igdc has been a first attempt at creating a user-friendly interface in the form of a map-based data browser allowing the users to access and retrieve geographic datasets about idaho. in order to track and evaluate the use of geospatial data, webtrends software was installed on the igdc server. the webtrends software produces customized web log statistics and allows tracking information on traffic and reproduced with permission of the copyright owner. further reproduction prohibited without permission. ahsahka -southwick ·· lenore --juliaetta green knob -· aldeamand ridge park texas ridge · mcgary butte ·· bovill deary viola palouse dlg_aoads i.tj dlg_rai l!l ·dlg_transp01t dlg_hydro olg_bcu'ldaries tiger_streets tiger_bnds ----'-----'--"-'--'--'----'=---:__.:_::.._-_·-since the opening of igdc for public us e (april 1998), the geolibrary map browser was downloaded 1,352 times. the software proved to be relati vely easy to use by the public. out of fort y-four bug report s/ user questions submitted to igdc, most were concerned with filling out the software registration form and not with software failure. the igdc project spurred an interest in geographic information among students , faculty, and librarians at the university of idaho. in a direct response to this interest, the university of idaho library installed a new dedicated computer at the reference desk with geolibrar y software to access, view , and retrieve igdc data . i conclusion idaho geospatial data center is the first geospatial digital library for the state of idaho. it does not fulfill all requirements of a figure 4. the results panel. results of a query can be sorted; individual items can be removed from the list or can be deselected to prevent them from being downloaded . geolibrary model proposed by goodchild and others. the igdc has only two components of the geolibrary model; they are the datasets dissemination. during a one-year timeframe the number of successful hits was more than twenty-five thousand . almost 40 percent of users came from .com domain, 35 percent were .net domain users, 15 percent w ere .org, and 10 percent were .edu users (figure 5). tracking the geographic origin of users by state, the biggest number of users came from virginia, followed by washington, california, ohio, and idaho . the high number of users from virginia can be explained by the linking of the igdc site to one of the most popular geospatial data sites in the country-the united states geological survey (usgs) site. eighty-four percent of user sessions were from the united states; the rest originated from sweden, canada , and germany. the average number of hits per day on weekdays was around one hundred customers. the most popular retrievable information were digital raster graphics (drg) data that present scanned images of usgs standard series topographic maps at 1:24,000 scale. digital elevation models (dem) and digital line graphs (dlg) were less popular. the tiger boundary files for the state of idaho were in small demand . the popularity of drg-format maps and the fact that most of the users accessed igdc via the usgs web site makes plausible a speculation that most of the users were non-gis specialists interested in general reference geographic information about idaho including topography and basic land use information. geolibrary map browser and the basemap . the main difference between the geolibrary map browser and a web-based browser solution adopted by other spatial repositories is a client-side solution to geospatial data query and selection. spatial data query is done locally on the user's machine, using the library data base schema contained in the geolibrary map browser. this saves time by eliminating client-server communication delays during data searches, gives the user an experience of almost instantaneous response to queries , and reduces the network communication to the data download time . in comparison with th e geolibrary model, igdc is missing the gazetteer . this component can help improve the ease of user navigation through a geospatial data collection. the other useful component includes online mapping and spatial data visualization services. the idea of such services is to provide the user with a simple-tooperate mapping tool for visualizing and exploring the results of user-run queries . one such service, currently under implementation at igdc, includes thematic mapping of economic and demographic variables for idaho using descartes software .16 descartes is a knowledgebased system supporting users in the design and utilization of thematic maps. the knowledge base incorporates domain-independent visualization rules determining which map presentation technique to employ in response to the user selection of variables. an intelligent is this a geolibrary? i jankowska and jankowski 9 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i ,i distribution of igdc users (in %) by domain 40 30 20 10 0 . com .net org .edu web domain categories figure 5. distribution of igdc users in percent by origin domain map generator such as descartes can enhance the utility of a geolibrary by providing tools to transform georeferenced data into information. references and notes 1. l. covi and r. king, "organizational dimensions of effective digital library use: closed rational and open natural systems models," journal of the american society for information science 47, no. 9 (1996): 697. 2. k. musser, "interactive mapping on the world wide web." (1997) accessed march 6, 2000, www .min.net/-boggan/ mapping/thesis.htm. 3. t. bernhardsen, geographic information systems (arendal, norway: viak it and norwegian mapping authority, 1992), 2. 4. ibid., 4. 5. j. stone, "stocking your gis data library," issues in science and technology librarianship. (winter 1999). accessed march 6, 2000, www.library.ucsb .edu/istl/99-winter/articlel. html. 6. p. schroeder, "gis in public participation settings." (1997.) accessed june 2, 1999, www.spatial.maine.edu/ucgis/ testproc/ schroeder / ucgisdft.htm . 7. w. j. craig and others, "empowerment, marginalization, and public participation gis," report of a specialist meeting held under the auspices of the varenius project. santa barbara, california, oct. 15-17, 1998, ncgia, uc santa barbara. 8. b. plewe, gis online: information retrieval, mapping, and the internet (santa fe, n.m.: on word pr., 1997), 71-91 . 9. m. f. goodchild, "the geolibrary," in innovations in gis 5: selected papers from the fifth national conference on gis research uk (gisruk), ed. s. carver. (london: taylor and francis, 1998), 59. accessed march 6, 2000, www.geog.ucsb.edu/ -good/geolibrary.html . 10. b. p. buttenfield, "making the case for distributed geolibraries." (1998) accessed march 6, 2000, www.nap.edu/ html/ geolibraries/ app_b .html . 11. ibid . 12. m. rock, "monitoring user navigation through the alexandria digital library," (master's thesis abstract, 1998). accessed march 6, 2000, http :/ /greenwich.colorado.edu/projects/ rockm.htm. 13. l. l. hill and others, "geographic names the implementation of a gazetteer in a georeferenced digital library. d-lib magazine 5, no. 1 (1999). accessed march 6, 2000, www.dlib. org/ dlib/ january99 /hill/0lhill.html. 14. m. gluck and others, "public librarians' views of the public's geospatial information needs," library quarterly 66, no . 4 (1996): 409. 15. b. p. buttenfield, "user evaluation for the alexandria digital library project." (1995) accessed march 6, 2000, http://edfu.lis.uiuc.edu/allerton/95 /s 2/buttenfield .html. 16. g. andrienko and others, "thematic mapping in the internet: exploring census data with descartes," in proceedings of telegeo '99, first international workshop on telegeoprocessing, lyon, may 6-7, r. laurini, ed. (seiten, france: claude bernard univ. of lyon, 1999), 138--45. 10 information technology and libraries i march 2000 lita president’s message joining together emily morton-owens information technology and libraries | december 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. . in writing this column i am looking ahead, as i have been throughout my term as vice-president and president of lita, to the possibility of our merger with alcts and llama. recently our discussions have included an exploration on all sides of how a division can support members through their career. this has inspired me to reflect on how lita has always taken a broad and inclusive view of what library technology work is and can be in the future. i believe the proposed core division can support and extend that tradition. one question that i’ve heard posed from time to time is “am i technical enough for lita?” longtime lita members like to answer that with a full-throated “yes!” if you’re interested enough to ask the question, we want you to join us in using technology as a part of your work. we want you to be supported in doing so at your current skill level, whether or not you want to make technology more a part of your work than it is today. if you want to go deeper into technology, we’ll be there with you. while the culture of the for-profit technology industry can promote imposter syndrome, we want lita to be a haven. in lita’s events and meetings, we consistently see different facets of library technology work reflected. some of us are training users in new technologies or creating programs that get young people excited about coding. others are working to make online resources accessible and easy for our users to benefit from. we have members who are manipulating metadata, creating services to help researchers comply with data management requirements, creating websites that guide users to the information they need, and preserving cultural heritage in digital forms. some of us manage tech projects or workers. some of our members work on large tech teams with generous resources and others are spinning magic just from their own skills. when i started working in libraries, my bosses and mentors were often librarians who had started in technical services or other roles, before “automation.” eager to improve their own workflows, and getting pulled into ils migrations and catalog development, they had become the technology experts. these accidental systems librarians have always been some of my favorite colleagues because of their sure-footed approach to our data. recently i’ve come to work with colleagues who are accidental systems librarians in the opposite sense; tech workers who took jobs in libraries and embraced what we do. one developer on my team, who had no previous library experience, took to our projects and ethical stance like a duck to water. he told me that he now goes to parties and tells people about how librarians are defenders of privacy and protectors of information. lita embraces growth in any direction because we want to support learning and problem-solving with a foundation of shared principles and resources. i don’t see these developments as time-based or inevitable in any given person’s career. there are plenty of library tech workers who prefer being an individual contributor and think they have mailto:egmowens.lita@gmail.com joining together | morton-owens 3 https://doi.org/10.6017/ital.v38i4.11905 their biggest impact doing direct work on applications. and many of my technical services colleagues prefer to define their work goals in those terms, no matter how adept they become with tech tools. whether or not they seek out a management position, our members will probably find themselves exhibiting leadership in some context, like developing standards or advocating for standards. instead of a rigid path of career development, many librarians today have fluid and multi-faceted careers. for myself, i have held similar positions at quite different types of libraries—public, medical, academic. lita has always been a part of my experience, though, providing a sort of collegial bedrock through a lot of change. the people are what make lita, lita: friendly, principled, and quirky. lita members are the kind of people who will learn all they can about a technology like the amazon alexa—and then unplug the one on the exhibit floor at annual. both as i was thinking about all this, and in this resulting column, leadership, collections, and technical services kept coming up. there is such strong and fruitful cross-pollination among these specialties, and i see that as something that would enhance the member experience—both for current lita members who want more contact with expert colleagues and for current llama and alcts members who want learning opportunities and support for their work with technology. lita members love to share their knowledge and hash through challenges together. sometimes i wish more ala members would feel comfortable giving us a try, and perhaps core will be a new, friendly face for that ongoing outreach. if, in the future, someone asked the new question “am i technical enough for core?” i’m sure the answer will be the same: “yes, please join us!” academic libraries on social media: finding the students and the information they want heather howard, sarah huber, lisa carter, and elizabeth moore information technology and libraries | march 2018 8 heather howard (howar198@purdue.edu) is assistant professor of library science; sarah huber (huber47@purdue.edu) is assistant professor of library science; lisa carter (carte241@purdue.edu) is library assistant; and elizabeth moore (moore658@purdue.edu) is library assistant and student supervisor at purdue university. librarians from purdue university wanted to determine which social media platforms students use, which platforms they would like the library to use, and what content they would like to see from the library on each of these platforms. we conducted a survey at four of the nine campus libraries to determine student social media habits and preferences. results show that students currently use facebook, youtube, and snapchat more than other social media types; however, students responded that they would like to see the library on facebook, instagram, and twitter. students wanted nearly all types of content from the libraries on facebook, twitter, and instagram, but they did not want to receive business news or content related to library resources on snapchat. youtube was seen as a resource for library service information. we intend to use this information to develop improved communication channels, a clear social media presence, and a cohesive message from all campus libraries. introduction in his book tell everyone: why we share and why it matters, alfred hermida states, “people are not hooked on youtube, twitter or facebook but on each other. tools and services come and go; what is constant is our human urge to share.”1 libraries are places of connection, where people connect with information, technologies, ideas, and each other. as such, libraries look for ways to increase this connection through communication. social media is a key component of how students communicate with classmates, families, friends, and other external entities. it is essential for libraries to communicate with students regarding services, collections, events, library logistics, and more. purdue university is a large, land-grant university located in west lafayette, indiana, with an enrollment of more than forty thousand. the purdue libraries consist of nine libraries, presented collectively on the social media platforms facebook and twitter since 2009 and youtube since 2012. going forward, the purdue libraries want to ensure it establishes a cohesive message and brand that is communicated to students on platforms they use and on which they will engage with it. the purpose of this study was to determine which social media platforms the students are currently using, which platforms they would like the library to use, and what content they would like to see from the libraries on each of these platforms. mailto:howar198@purdue.edu mailto:huber47@purdue.edu mailto:carte241@purdue.edu mailto:moore658@purdue.edu academic libraries on social media | howard, huber, carter, and moore 9 https://doi.org/10.6017/ital.v37i1.10160 literature review academic libraries and social media academic libraries have been slow to accept social media as a venue for either promoting their services or academic purposes. a 2007 study of 126 academic librarians found that only 12 percent of those surveyed “identified academic potential or possible benefits” of facebook while 54 percent saw absolutely no value in social media.2 however, the mission of academic libraries has shifted in the last decade from being a repository of knowledge to being a conduit for information literacy; new roles include being a catalyst for on-campus collaboration and a facilitator for scholarly publication within contemporary academic librarianship.3 academic librarians have responded to this change, with many now believing that “social media, which empowers libraries to connect with and engage its diverse stakeholder groups, has a vital role to play in moving academic libraries beyond their traditional borders and helping them engage new stakeholder groups.”4 student perceptions about academic libraries on social media as the use of social media has grown with college-aged students, so has an increasing acceptance of academic libraries using social media to communicate. a pew research center report from 2005 showed just 7 percent of eighteen to twenty-nine year olds using social media. by 2016, 86 percent were using social media.5 in 2007 the oclc asked 511 college students from six different countries to share their thoughts on libraries using social networking sites. this survey revealed that “most college students would be unlikely to participate in social networking services offered by a library,” with just 13 percent of students believing libraries have a place on social media.6 however, just two years later (in 2009), a shift was seen: students were open to connecting with academic libraries, as observed in a survey of 366 freshmen at valparaiso university. when asked their thoughts on the library sending announcements and communications to them via facebook or myspace (a social media powerhouse at the time), 42.6 percent answered they would be “more receptive to information received in this way than any other response.” a smaller group, 12.3 percent, responded more negatively to this approach. students showed concern for their privacy and the level of professionalism, as a quote from a student illustrates: “facebook is to stay in touch with friends or teachers from the past. email is for announcements. stick with that!!!” 7 as students report becoming more open to academic libraries on social media, the question of whether they will engage through social media emerges. a recent study from western oregon university’s hammersley library asked this question with promising results. forty percent of students said they were either “very likely “or “somewhat likely” to follow the library on instagram and twitter, as opposed to wanting communications being sent to them directly through social media (for example, a facebook message). pinterest followed, with 33 percent of students saying they were either “very likely” or “somewhat likely” to follow the library using this platform.8 throughout the literature, students have shown an interest in information about the libraries that is useful to them. in another survey given to undergraduate students from three information technology classes at florida state university, one question examined the perceived importance of different library social media postings to students. the report showed students considered postings related to operations updates, study support, and events as the most important.9 in the hammersly study noted above, 78 percent and 87 percent of respondents said information technology and libraries | march 2018 10 they were either “very interested” or “somewhat interested,” respectively, in every category relating to library resources presented in the survey, but “interesting/fun websites and memes” received the least interest from participants.10 the literature shows an increase in students being receptive to academic libraries on social media. results vary campus to campus and students are leery of libraries reaching out to them via social media, but they have an increasingly positive view about content posted that will help them with the library. research questions the aim of this project was to investigate the social media behaviors of purdue university students as they relate to the libraries, and to develop evidence-based practices for managing the library’s social media accounts. the project focused on three research questions: 1. what social media platforms are students using? 2. what social media platforms do students want the library to use? 3. what kind of content do students want from the library on each of these platforms? methods we created the survey using the web-based qualtrics survey software. it was distributed in electronic form only, and it was promoted to potential respondents via table tents in the libraries, bookmarks at the library desk, facebook posts, and in-classroom promotion. potential respondents were advised that the survey was anonymous and voluntary. the survey consisted of closed questions, though many questions contained an open-ended field for answers that did not fall into the provided choices. inspiration for some of the options in our survey questions came from the hammersly library study, as we felt they did a good job capturing information about the social media usage of their patrons.11 our survey asked what social media platforms students use, what they use them for, how often they visit the library, how likely they are to follow the library on social media, which platforms they want the library to have, and what content they would like from the library on each of those platforms. the social media platforms included were facebook, flickr, g+, instagram, linkedin, pinterest, qzone, renren, snapchat, tumblr, twitter, youtube, and yik yak.12 there were also open-ended spaces where participants could write in additional platforms. the survey originally ran for three weeks in only the business library early in the spring 2017 semester, as its intended purpose was to inform how the business library would manage social media. after that survey was completed, we decided to replicate the survey in three additional libraries (humanities, social science, and education; engineering; and the main undergraduate libraries). this was done to expand the dataset and reach additional students in a variety of disciplines. these libraries were chosen because they were the libraries in which the authors work, with the hope to expand to additional libraries in the future. the second survey also lasted for three weeks starting in mid-april of the spring 2017 semester. as a participation incentive, students who completed the initial survey and the second survey had an opportunity to enter a drawing for a $25 visa gift card. academic libraries on social media | howard, huber, carter, and moore 11 https://doi.org/10.6017/ital.v37i1.10160 the survey was advertised across four different campus libraries and promoted in several ways to reach different populations. though the results are not from a random sample of the student population, the results are broad enough that we intend to apply them to our entire student population. results survey the survey was completed by 128 students. an additional 13 students began the survey but did not complete it; we removed their results from the analysis. the breakdown of respondents was 10 percent freshmen (n = 13), 22 percent sophomore (n = 28), 27 percent junior (n = 35), 20 percent senior (n = 25), and 21 percent graduate or professional (n = 27). library usage the students were asked how frequently they visit the library to determine if the survey was reaching a population of regular or infrequent library visitors. the results showed that the students who completed the survey were primarily frequent library users, with 93 percent (n = 119) visiting once a week or more. social media platforms the students were asked to identify which social media platforms they used and how frequently they used them. the most popular social media platforms were determined by combining the number of students who said they used them daily or weekly. the top five were facebook (n = 114, 88 percent), youtube (n = 102, 79 percent), snapchat (n = 90, 70 percent), instagram (n = 85, 66 percent), and twitter (n = 41, 32 percent). full results are in table 1. table 1. usage frequency by platform social media platform daily weekly monthly < once per month never facebook 94 (72.87%) 20 (15.50%) 5 (3.88%) 5 (3.88%) 4 (3.10%) flickr 0 (0.00%) 1 (0.78%) 2 (1.55%) 8 (6.20%) 117 (90.70%) g+ 3 (2.33%) 6 (4.65%) 4 (3.10%) 16 (12.40%) 99 (76.74%) instagram 68 (52.71%) 17 (13.18%) 5 (3.88%) 11 (8.53%) 27 (20.93%) linkedin 9 (6.98%) 29 (22.48) 22 (17.05%) 22 (17.05%) 46 (35.66%) pinterest 12 (9.30%) 12 (9.30%) 16 (12.40%) 19 (14.73%) 69 (53.49%) qzone 0 (0.00%) 0 (0.00%) 0 (0.00%) 4 (3.10%) 124 (96.12%) renren 0 (0.00%) 0 (0.00%) 1 (0.78%) 3 (2.33%) 124 (96.12%) snapchat 84 (65.12%) 6 (4.65%) 6 (4.65%) 7 (5.43%) 25 (19.38%) tumblr 7 (5.43%) 2 (1.55%) 7 (5.43%) 11 (8.53%) 101 (78.29%) information technology and libraries | march 2018 12 social media platform daily weekly monthly < once per month never twitter 28 (21.71%) 13 (10.08%) 12 (9.30%) 9 (6.98%) 66 (51.16%) youtube 58 (44.96%) 44 (34.11%) 15 (11.63%) 4 (3.10%) 7 (5.43%) yik yak 0 (0.00%) 0 (0.00%) 0 (0.00%) 11 (8.53%) 117 (90.70%) other: email 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: groupme 3 (2.33%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: reddit 2 (1.55%) 2 (1.55%) 0 (0.00%) 0 (0%) 0 (0.00%) other: skype 0 (0.00%) 0 (0.00%) 0 (0.00%) 1 (0.78%) 0 (0.00%) other: vine 0 (0.00%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: wechat 3 (2.33%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: weibo 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: whatsapp 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) social media activity next, students were asked how much time they spend on social media doing the following activities: watching videos, keeping in touch with friends/family, sharing photos, keeping in touch with classmates/professors, learning about campus events, doing research, getting news, or following public figures. table 2 shows that students overwhelmingly use social media daily or weekly to watch videos (94 percent, n = 120), keep in touch with family/friends (93 percent, n = 119), and to get news (81 percent, n = 104). the least popular activities, those that students do less than once per month or never, were research (47 percent, n = 60) and to following public figures (34 percent, n = 45). social media and the library the students were asked how likely they are to follow the libraries on social media. the response to this was primarily positive, with 57 percent of respondents saying they are either extremely likely or somewhat likely to follow the library. one response for this question was inexplicably null, so for this question n = 127. figure 1 contains the full results. academic libraries on social media | howard, huber, carter, and moore 13 https://doi.org/10.6017/ital.v37i1.10160 table 2. social media activity social media activity daily weekly monthly < once per month never watch videos 85 (66.41%) 35 (27.34%) 1 (0.78%) 4 (3.13%) 3 (2.34%) keep in touch with friends/family 89 (69.53%) 30 (23.44%) 6 (4.69%) 2 (1.56%) 1 (0.78%) share photos 32 (25%) 33 (25.78%) 38 (29.69%) 20 (15.63%) 5 (3.91%) keep in touch with classmates/professors 34 (26.56% 47 (36.72%) 21 (16.41%) 19 (14.84%) 7 (5.47%) learn about campus events 24 (18.75%) 53 (41.41%) 29 (22.66%) 18 (14.06%) 4 (3.13%) do research 24 (18.75%) 26 (20.31%) 18 (14.06%) 23 (17.97%) 37 (28.91%) get news 66 (51.56%) 38 (29.69%) 7 (5.47%) 9 (7.03%) 8 (6.25%) follow public figures 34 (26.56%) 30 (23.44%) 20 (15.63%) 19 (14.84%) 24 (18.75%) other 2 (1.56%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) figure 1. library social media follows. 12 66 23 16 10 0 10 20 30 40 50 60 70 extremely likely somewhat likely neither likely nor unlikely somewhat unlikely extremely unlikely how likely are you to follow the library on social media? information technology and libraries | march 2018 14 the students were asked which social media platforms they thought the library should be on. five rose to the top of the results: facebook (82 percent, n = 105), instagram (55 percent, n = 70), twitter (40 percent, n = 51), snapchat (34 percent, n = 44), and youtube (29 percent, n = 37). full results can be seen in figure 2. after a student selected a platform they wanted the library to be on, logic built into the survey then directed them to an additional question that asked what content they would like to see from the library on that platform. content included library logistics (hours, events, etc.), research techniques and tips, how to use library resources and services, library resource info (database instruction/tips, journal availability, etc.), business news, library news (e.g., if the library wins an award), campus-wide info/events, and interesting/fun websites and memes. for facebook, students widely selected all types of content, with the most selections made for library logistics (n = 73) and the fewest made for business news (n = 33). for instagram, students wanted all content except business news (n = 18). snapchat was similar, except along with business news (n = 8), students also were not interested in receiving content related to library resource information (n = 9). twitter was similar to facebook in that all content was widely selected. youtube had a focus on library services, with the three most-selected content options being research techniques and tips (n = 20), how to use library resources and services (n = 19), and library resource info (n = 16). table 3 contains the full results. figure 2. library social media presence. 105 7 70 23 10 1 1 44 5 51 37 0 20 40 60 80 100 120 facebook g+ instagram linkedin pinterest qzone renren snapchat tumblr twitter youtube what social media platform should the library be on? academic libraries on social media | howard, huber, carter, and moore 15 https://doi.org/10.6017/ital.v37i1.10160 table 3. library social media content by platform what type of content would you like to see from the library? content type f a c e b o o k (n = 1 0 5 ) g + (n = 7 ) in s ta g r a m (n = 7 0 ) l in k e d in (n = 2 3 ) p in te r e s t (n = 1 0 ) s n a p c h a t (n = 4 4 ) t u m b lr (n = 5 ) t w itte r (n = 5 1 ) y o u t u b e (n = 3 7 ) library logistics (hours, events, etc.) 73 (69.52%) 2 (28.57%) 34 (48.57%) 7 (30.43%) 4 (40%) 23 (52.27%) 2 (40%) 32 (62.75%) 8 (21.62%) research techniques & tips 52 (49.52%) 3 (42.85%) 28 (40%) 13 (56.53%) 7 (70%) 19 (43.18%) 3 (60%) 27 (52.94%) 20 (54.05%) how to use library resources & services 53 (50.48%) 3 (42.85%) 26 (37.14%) 8 (34.78%) 7 (70%) 16 (36.36%) 3 (60%) 25 (49.02%) 19 (51.35%) library resource info (database instruction/tips , journal availability, etc.) 53 (50.48%) 3 (42.85%) 22 (31.42%) 8 (34.78%) 6 (60%) 9 (20.45%) 2 (40%) 23 (45.10%) 16 (43.24%) business news 33 (31.43%) 2 (28.57%) 18 (25.71%) 13 (56.52%) 3 (30%) 8 (18.18%) 2 (40%) 17 (33.33%) 7 (18.92%) library news (e.g., if the library wins an award) 49 (46.67%) 3 (42.85%) 37 (52.86%) 12 (52.17%) 5 (50%) 19 (43.18%) 3 (60%) 24 (47.06%) 7 (18.92%) campus-wide info/events 73 (69.52%) 3 (42.85%) 42 (60%) 5 (21.74%) 5 (50%) 26 (59.09%) 2 (40%) 35 (68.63%) 13 (35.14%) interesting/fun websites & memes 48 (45.71%) 0 41 (58.57%) 2 (8.70%) 10 (100%) 30 (68.18%) 3 (60%) 26 (50.98%) 12 (32.43%) other 1 (0.95%) 0 2 (2.86%) 0 1 (10%) 2 (4.55%) 0 2 (3.92%) 1 (2.70%) discussion historically, libraries have used social media as a marketing tool.13 with social media’s everincreasing popularity with young adults, academic libraries have actively established a presence on several platforms.14 our survey shows that our students follow this trend, using social media regularly and for a variety of activities. we were surprised that facebook turned out to be the information technology and libraries | march 2018 16 most widely used by our students, as much has been written in the last few years about teens and young adults leaving the platform.15 a november 2016 survey, however, found that 65 percent of teens said they used facebook daily, a large increase from 59 percent in november 2014. though snapchat and instagram preferred, teens continue to use facebook for its utility in scheduling events or keeping in touch regarding homework.16 students do seem receptive to following the library on different platforms and report wanting primarily library-related content from us, including more in-depth content such as research techniques and database instruction. limitations and future work findings from this study give insight into opportunities for libraries to reach university students through social media. we acknowledge that only limited generalizations can be made because of the way the survey was conducted. our internal recruitment methods led to a selection bias in our surveyed population, as advertisement of the survey took place either in the chosen libraries or on the purdue libraries’ existing facebook page. because of this, our sample consists primarily of students who visit the library or already follow the library on facebook. we hope to alter this in future surveys by expanding our recruitment to other physical spaces across campus. in addition, we plan to add questions that first establish a better understanding of students’ opinions of libraries being on social media before asking what social media they would like to see libraries use. this would potentially avoid leading students to an answer. further, we are concerned we took for granted students’ understanding of library resources; that is, we may have made distinctions librarians understand, but students may not. in future studies, we plan to rephrase, and possibly combine, questions in a way that will be clear to people less familiar with library resources and services. we believe confusion with these questions created contradictory responses. for example, “research help through social media” received a low response rate, but “information on research techniques and tips” received a much higher response rate. additionally, a limitation of using a survey to collect behavior information is that respondents do not always report how they actually behave. using methods such as focus groups, interviews, text mining, or usability studies could provide a more holistic view of student behavior. duplication of this study on a yearly or semi-yearly basis across all libraries could help us see how social media preferences change over time and across a larger sample of our population. this study aimed to provide a broad view of a large university’s student body by surveying across different subject libraries. with the changes discussed, we think a revised survey could give us the detailed information we need to build a more effective social media strategy that reaches both library users and non-users. conclusion this study improved our understanding of the social media usage and preferences of purdue students. from these results, we intend to develop better communication channels, a clear social media presence, and a more cohesive message across the purdue libraries. under the direction of our new director of strategic communication, a social media committee was formed with representatives from each of the libraries to contribute content for social media. the committee will consider expanding the purdue libraries’ social media presence to communication channels where students have said they are and would like us to be. as social media usage is ever-changing, we recommend repeated surveys such as this to better understand where on social media students want to see their libraries and what information they want to receive from them. academic libraries on social media | howard, huber, carter, and moore 17 https://doi.org/10.6017/ital.v37i1.10160 references 1 alfred hermida, tell everyone: why we share and why it matters (toronto: doubleday canada, 2014), 1. 2 laurie charnigo and paula barnett-ellis, “checking out facebook.com: the impact of a digital trend on academic libraries,” information technology and libraries 26, no. 1 (march 2007): 23–34, https://doi.org/10.6017/ital.v26i1.3286. 3 stephen bell, lorcan dempsey, and barbara fister, new roles for the road ahead: essays commissioned for the acrl’s 75th anniversary (chicago: association of college and research libraries, 2015). 4 amanda harrison et al., “social media use in academic libraries: a phenomenological study,” journal of academic librarianship 43, no. 3 (may 1, 2017): 248–56, https://doi.org/10.1016/j.acalib.2017.02.014. 5 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 6 online computer library center, sharing, privacy and trust in our networked world: a report to the oclc membership, (dublin, ohio: oclc, 2007)), https://eric.ed.gov/?id=ed532599. 7 ruth sara connell, “academic libraries, facebook and myspace, and student outreach: a survey of student opinion,” portal: libraries and the academy 9, no. 1 (january 8, 2009): 25–36, https://doi.org/10.1353/pla.0.0036. 8 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344. 9 besiki stvilia and leila gibradze, “examining undergraduate students’ priorities for academic library services and social media communication,” journal of academic librarianship 43, no. 3 (may 1, 2017): 257–62, https://doi.org/10.1016/j.acalib.2017.02.013. 10 brookbank, “so much social media, so little time.” 11 stvilia and gibradze, “examining undergraduate students’ priorities.” 12 qzone and renren are chinese social media platforms. 13 curtis r. rogers, “social media, libraries, and web 2.0: how american libraries are using new tools for public relations and to attract new users,” south carolina state library, may 22, 2009, http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_20 09-5.pdf?sequence=1; jakob harnesk and marie-madeleine salmon, “social media usage in libraries in europe—survey findings,” linkedin slideshare slideshow presentation, august https://doi.org/10.6017/ital.v26i1.3286 https://doi.org/10.1016/j.acalib.2017.02.014 http://www.pewinternet.org/fact-sheet/social-media/ https://eric.ed.gov/?id=ed532599 https://doi.org/10.1353/pla.0.0036 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1016/j.acalib.2017.02.013 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 information technology and libraries | march 2018 18 10, 2010, https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europesurvey-teaser. 14 “social media fact sheet.” 15 daniel miller, “facebook’s so uncool, but it’s morphing into a different beast,” the conversation, 2013, http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-differentbeast-21548; ryan bradley, “understanding facebook’s lost generation of teens,” fast company, june 16, 2014, https://www.fastcompany.com/3031259/these-kids-today; nico lang, “why teens are leaving facebook: it’s ‘meaningless,’” washington post, february 21, 2015, https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teensare-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662. 16 alison mccarthy, “survey finds us teens upped daily facebook usage in 2016,” emarketer, january 28, 2017, https://www.emarketer.com/article/survey-finds-us-teens-upped-dailyfacebook-usage-2016/1015053. https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 https://www.fastcompany.com/3031259/these-kids-today https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 introduction literature review academic libraries and social media student perceptions about academic libraries on social media research questions methods results survey library usage social media platforms social media activity social media and the library discussion limitations and future work conclusion references microsoft word 14291 20211219 author.docx letter from the editor december 2021 kenneth j. varnum information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14291 another year is nearly in the books. it has not been an easy year for many of us; perhaps not as truly chaotic and frightening as 2020 was, but still a year filled with uncertainty and rising concerns about the path of the pandemic. as we turn to a new year in our calendars, i wish all our readers health, peace, and a continued spirit of adaptation as we begin 2022. our public libraries leading the way column, “how covid affected our python class at the worcester public library” by melody friendenthal, is a follow up to her 2020 column on moving a library course on the python programming language from in-person to online for the worcester (ma) public library. our peer-reviewed content this month showcases topics including: digital library innovations; virtual reality; diversity, equity, and inclusion (dei); and library hackathons. 1. stateful library analysis and migration system (slam): etl system for performing digital library migrations / adrian-tudor panescu, teodora-elena grosu, and vasile manta 2. black, white, and grey: the wicked problem of virtual reality in libraries / gillian d. ellern and laura cruz / 3. bridging the gap: using linked data to improve discoverability and diversity in digital collections / jason boczar, bonita pollock, xiying mi, and amanda yeslibas 4. developing a minimalist multilingual full-text digital library solution for disconnected remote library partners / todd digby 5. diversity, equity & inclusion statements on academic library websites: an analysis of content, communication, and messaging / eric ely 6. a 21st century technical infrastructure for digital preservation / nathan tallman 7. hackathons and libraries: the evolving landscape 2014-2020 / meris mandernach longmeier kenneth j. varnum, editor varnum@umich.edu december 2021 letter to the editors information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16995 about this section letters to the editor reflect the opinions of their authors and are not necessarily those of the ital editorial board or ala’s core division. each letter’s copyright is held by its authors and is published under a creative commons cc-by-nc-4.0 license. dear editorial board, i read richard brzustowicz’s recent article in information technology and libraries, “from chatgpt to catgpt,” and while it excites me to see the conversation about ai and cataloging emerging in the scholarly discourse, i have unfortunately found a number of errors in this article’s methodology which i believe to be incapable of leading in good faith to the conclusions offered. i have included below some supporting evidence, though my comments are not to be taken as a total analysis of this article. in the interest of brevity, my comments will focus primarily on the methodology and conclusions sections of the article, in particular the first comparison of a generated record and one found in worldcat. my feedback is structured in the form of responses to specific quotations from the article that present issues in the domain of cataloging, the evaluation of metadata quality, and the citation of inputs and outputs in interactions with generative ai. “when asked about its training data, chatgpt replied: …” (2) what exactly was asked of chatgpt? without documenting the input, the output lacks sufficient context to draw firm conclusions about its validity. inputs are a critical component of scholarship on ai, as the object of study consists of the input and the output together. for additional context, i have included a brief annotated bibliography of peer reviewed scholarship that cites interactions with chatgpt. choudhary, om prakash, jyoti saini, and amit challana. “chatgpt for veterinary anatomy education: an overview of the prospects and drawbacks.” international journal of morphology 41, no. 4 (august 2023): 1198–1202, https://doi.org/10.4067/s071795022023000401198. see table 1 (1200). inputs and outputs are arranged in a table to aid comprehension. gross, nicole. “what chatgpt tells us about gender: a cautionary tale about performativity and gender biases in ai.” social sciences 12, no. 435 (august 2023). https://doi.org/10.3390/socsci12080435. see data availability statement (12), “the illustrative examples (responses) used in this paper have been generated by chatgpt in response to questions posed by the author (prompts) (https://chat.openai.com/). the prompts can be found in the reference list and full responses can be made available on request.” suppadungsuk, supawadee, charat thongprayoon, pajaree krisanapan, supawit tangpanithandee, oscar garcia valencia, jing miao, poemlarp mekraksakit, kianoush kashani, and wisit cheungpasitporn. 2023. “examining the validity of chatgpt in identifying relevant nephrology literature: findings and implications.” journal of clinical medicine 12, no. 5550 (2023). https://doi.org/10.3390/jcm12175550. https://doi.org/10.4067/s0717-95022023000401198 https://doi.org/10.4067/s0717-95022023000401198 https://doi.org/10.3390/socsci12080435 https://doi.org/10.3390/jcm12175550 information technology and libraries december 2023 letter to the editors 2 floyd see section 2.1 (3), “the search prompts provided to chatgpt requested references in the vancouver style, a commonly used citation style in academic writing, along with their corresponding links. we generated the prompt “please provide the references in vancouver style and their links in recent literature on... name of the topic” to chatgpt.” “i asked chatgpt to generate a marc record for the 1996 edition of anne rice’s interview with the vampire using rda (chatgpt, personal communication, february 23, 2023).” (2) the article again does not document the actual input given to chatgpt, therefore the amount of priming chatgpt may have received, or perhaps any ambiguity in the prompt that could explain the composition of the generated record is unknown. for example, consider these three 1996 publications of anne rice’s interview with the vampire that would be recorded on separate bibliographic records. in 1996, knopf published the anniversary edition of the original 1976 ballantine edition; warner books, the uk publisher and distributor, reissued its 1976 edition; and boekerij, in amsterdam, published a translation into dutch with a parallel english title, making it a third 1996 release of anne rice’s interview with the vampire. not all are true editions in the bibliographical sense, but they all were produced in 1996, they all bear the title interview with the vampire, and they all meet the description in the above paraphrase of the prompt given to chatgpt. the prompt also specifies that the record should use rda, however, neither the generated record nor the record offered for comparison actually applies that standard, which goes unacknowledged in the article. if they did, the records would be coded as rda compliant using an 040 field with the element “$e rda.” that is not to say that the records are totally non-compliant with rda though, as there are many similarities between rda and its predecessor ruleset, the anglo american cataloging rules, 2nd edition (aacr2). there are also specifically rda compliant fields in both records, such as the 336, 337, and 338 fields. however, because the record found in worldcat is not coded rda, the most likely source of these fields is an automated process from oclc, as evidenced by the record’s edit history. modifications to records are marked by institution codes added to the 040 field, and the worldcat record selected shows 5 from oclc: “$d oclco $d oclcf $d oclcq $d oclco $d oclca” (9). “i compared it to a record in oclc’s worldcat,” (2) how was this record selected? in worldcat, as accessed through oclc connexion, there are 36 bibliographic records that match the keywords anne, rice, 1996 and the title words interview, vampire. nineteen of these records are cataloged in english. only one is coded rda (oclc record #1300814022), and that record was not the one record selected for comparison. interestingly, it has some errors in its application of rda, such as abbreviating “title page” to “tp.” which is incorrect, and the lack of a relationship indicator for the author (which should be used whenever appropriate, and the “author” relationship here is unambiguous). the record the author selected is oclc #1052676753. it is a good record, without errors, but it does not meet the author’s stated criteria for chatgpt to be compared with, invalidating it as an example. this is also true of the record in table 6. “the results of this test indicate that chatgpt can produce an accurate and effective record for interview with the vampire.” (2) this is incorrect, given the evidence presented. in just this first record, i found the following errors and inconsistencies. in the 100 field, the record uses neither the authorized access point for the information technology and libraries december 2023 letter to the editors 3 floyd author, nor the relationship designator “author” in $e. the authorized access point is “rice, anne, 1941-2021.” the record also has no 250 field (edition). this 1996 copy is a special edition of the book, whether the knopf or the ballantine one, and an edition note is required. the use of a 260 field is incorrect, because only the 264 is able to clearly disambiguate publication information from manufacture, distribution, production, and copyright, as required by rda. in the 300 field, “pages” is abbreviated to “p.,” which, while valid under aacr2, ceased with rda. lastly, and this may just be the formatting of the table, the 650 field’s indicator 0 is in the wrong position. these errors range from relatively minor, to the kind of mistake that would fail a validation check in oclc connexion. further, none of these listed fall into the area of cataloger’s judgment. “as chatgpt follows established cataloging rules, records created by the model are less likely to contain errors or inconsistencies;” (5) the errors present in chatgpt’s generated records directly contradict this claim. “one concern is the potential for copyright infringement, as chatgpt’s detailed descriptions of original works may be too like the originals, leading to legal issues for those who use the generated content without proper attribution or permission. this concern is particularly heightened for copyrighted works like books or music, where even small portions of the work can be protected.” (6) this claim is offered without any supporting citations. the article is not engaging with the discourse about fair use and the copyright status of text used to facilitate search and retrieval, or with specific rulings, such as those in authors guild, inc. v. google, inc. and authors guild, inc. v. hathitrust. “the study demonstrates that chatgpt has the potential to significantly streamline the cataloging process in libraries by generating accurate and consistent records.” (6) i have serious doubts about this conclusion, as can be seen by my above commentary. as a professional and a scholar in the area of cataloging, i find this to be both misleading and underinformative for readers who may someday be tasked with making hard decisions about the value of machine generated metadata vs. the labor of catalogers and metadata specialists. as a result, i would suggest that you seriously consider retraction or significant revisions. david floyd chief cataloging librarian binghamton university dfloyd@binghamton.edu mailto:dfloyd@binghamton.edu article exploring final project trends utilizing nuclear knowledge taxonomy an approach using text mining faizhal arif santosa information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15603 faizhal arif santosa (faizhalarif@gmail.com) is academic librarian, polytechnic institute of nuclear technology, national research and innovation agency. © 2022. abstract the national nuclear energy agency of indonesia (batan) taxonomy is a nuclear competence field organized into six categories. the polytechnic institute of nuclear technology, as an institution of nuclear education, faces a challenge in organizing student publications according to the fields in the batan taxonomy, especially in the library. the goal of this research is to determine the most efficient automatic document classification model using text mining to categorize student final project documents in indonesian and monitor the development of the nuclear field in each category. the knn algorithm is used to classify documents and identify the best model by comparing cosine similarity, correlation similarity, and dice similarity, along with vector creation binary term occurrence and tf-idf. a total of 99 documents labeled as reference data were obtained from the batan repository, and 536 unlabeled final project documents were prepared for prediction. in this study, several text mining approaches such as stem, stop words filter, n-grams, and filter by length were utilized. the number of k is 4, with cosine-binary being the best model with an accuracy value of 97 percent, and knn works optimally when working with binary term occurrence in indonesian language documents when compared to tf-idf. engineering of nuclear devices and facilities is the most popular field among students, while management is the least preferred. however, isotopes and radiation are the most prominent fields in nuclear technochemistry. text mining can assist librarians in grouping documents based on specific criteria. there is also the possibility of observing the evolution of each existing category based on the increase of documents and the application of similar methods in various circumstances. because of the curriculum and courses given, the growth of each discipline of nuclear science in the study program is different and varied. introduction the national nuclear energy agency of indonesia (batan), now known as the research organization for nuclear energy (ortn)—national research and innovation agency (brin), in 2018 issued a decision regarding batan’s six competencies: isotopes and radiation (ir), nuclear fuel cycle and advanced materials (nfcam), engineering of nuclear devices and facilities (endf), nuclear reactor (nr), nuclear and radiation safety and security (nrss), and management (mgt). these areas of focus are also known as batan’s knowledge taxonomy, which is used to support nuclear knowledge management (nkm) and the grouping of explicit knowledge in repositories.1 the polytechnic institute of nuclear technology (pint), which is under the auspices of batan and is now in one of the directorates of brin, can also utilize batan’s knowledge taxonomy to classify students’ final assignments. every year the pint library accepts final assignments from mailto:faizhalarif@gmail.com information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 2 santosa students who have graduated from three study programs, namely nuclear technochemistry, electronics instrumentation, and electromechanics. over the past six years (2017 to 2022), 563 final assignments in indonesian were collected and needed to be classified into the batan’s knowledge taxonomy in order to see the document growth of each existing competency. however, it is quite time consuming for librarians to assign individual documents to the most appropriate taxonomy term. it is also possible to involve experts to determine the right group, which results in increased working time to complete a document. this obstacle arises because librarians do not have in-depth and detailed knowledge of the nuclear field so it is feared that grouping errors will occur. in this study, the author tried to classify the collection of final project documents owned by the pint library based on batan’s knowledge taxonomy. the author used text mining tools, choosing the k-nearest neighbors (knn) algorithm for this study. similar research also leads to trying to focus on automatic document classification of certain subjects,2 which in this case is the subject of nuclear engineering. the hope is that users will find it easier to explore knowledge according to their area of interest through taxonomy grouping based on explicit knowledge,3 in this case, pint students’ final project documents. finding the trend of research conducted by students on each subject is also one of the goals of this research. literature review text mining in libraries the increasing number of publications currently makes it a challenge to classify and find out the growth and trends of a topic. document classification is one of the jobs that is quite time consuming so document classification automation by utilizing text mining is very necessary.4 the application and utilization of text mining itself is very broad. several studies have demonstrated the usefulness of text mining in libraries. pong et al. from city university of hong kong conducted research to facilitate the classification process using machine learning.5 this study aimed to streamline document categorization utilizing automatic document classification by using a system called the web-based automatic document classification system (wadcs) and claimed to be the pioneer of a comprehensive study of automatic document classification on a classification that is already popular in the world, namely the library of congress classification (lcc) utilizing knn and naive bayes (nb). this research indicates that the machine-learning algorithm they used can be applied by the library for document classification. wagstaff and liu utilized text mining to perform automatic classification to help make decisions to select candidate documents for weeding.6 this study used data from wesleyan university from 2011 to 2014 to predict which documents were eligible for weeding and which will be stored. five classifier models, namely knn, naive bayes, decision tree, random forest, and support vector machines (svm), were used to compare their performance. while this process may not replace librarians, this study can help librarians make better decisions and reduce their workload significantly. lamba and madhusudhan applied the use of text mining to extract important topics which were published in the desidoc journal of library and information technology over a period of 38 years.7 the latent dirichlet allocation (lda) method used in this study is able to find topics from information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 3 santosa within a collection of documents so that they can see how these topics develop over time. because lda is an algorithm for looking at topics from a group of words that appear together, the authors suggest that this study be expanded by utilizing articles that have been labeled using supervised classification. knn classifier various studies try to find answers to the most appropriate method of grouping the collection of documents. the knn and svm algorithms were used as comparative methods in the document classification study.8 however, there is no definite standard for the methods used in text mining.9 choosing the right technique in each phase of document classification can improve the performance of the text classifier, so, experts generally make adjustments to existing methods to get better results.10 kim and choi compared knn, maximum entropy model (mem), and svm to classify japanese patent documents by focusing on the structure of patents.11 instead of comparing the entire text, specific components named semantic elements, such as purpose, background, and application fields, are compared from the training document. these semantically grouped components are the basis for patent categorization. in addition, the strategy used is the existence of cross -references from two semantic fields that are useful for determining the intentions of the patent writer s who are still unsure or hidden. this strategy works well on knn compared to mem and svm where svm doesn’t do very well when handling large data sets. however, research conducted by alhaj et al. on arabic documents showed that svm can outperform knn by implementing a stemming strategy.12 meanwhile, through the approach to the relationship between unstructured text documents, the study conducted by mona et al. was able to increase the performance of knn combined with tf-idf by 5 percent.13 the knn algorithm is one of the popular classifiers that categorizes new data based on the concept of similarity from the amount of data (determined by the specified “k” value) around it.14 this method is believed to be able to group documents effectively because it is not limited to the number of vector sizes.15 wagstaff and liu noted that one of the weaknesses of knn is the long processing time when faced with large datasets, but knn as a classifier is easy to apply.16 in terms of measurement, previous experiments showed that knn was not suitable when used with euclidean distance.17 generally, similarity measures such as cosine, jaccard, and dice were used in the knn classifier.18 one of the problems in text classification is the number of attributes or dimensions so that many irrelevant attributes in the data set cause the classifier’s performance to not run optimally.19 for this reason, it is necessary to have a technique to increase effectiveness and reduce dimensions that are too large through the selection of features or terms,20 such as within-document tf, weighting with one of the popular methods, namely tf-idf (which sees how important a word is in a collection of corpus),21 and binary representation which looks at the absence and presence of a concept in a document22 by converting it to 0 and 1.23 aims of the study university libraries have a vital role in managing internal publications to support the education ecosystem. in connection with the role of the pint to support nkm and nuclear development, it is necessary to apply technology to help provide advice on certain classes of documents. in addition, in order to see scientific developments, generally experts conduct bibliometric studies which are information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 4 santosa limited to the title and abstract fields. text mining provides an opportunity to dig deeper. instead of just the title and abstract, this study used the full text of the final project collection. the trend of a subject will be seen from the growth and percentage of existing documents. so, the objectives of this study are to • explore the best knn model to be applied to classify the final project; • know the development of nuclear subjects based on batan’s knowledge taxonomy; and • know the development of nuclear subjects from each study program at the pint. methods a total of 99 documents were taken from the batan repository and manually labeled as reference data. this study was conducted using rapidminer studio software. the first document processing method is to convert all words into lower case and divide the text into a collection of tokens. filters on tokens are also applied based on the length of the token. in this case, the author applied a minimum of 3 characters and a maximum of 25 characters. stop words were also applied to eliminate short words (e.g., “and,” “the,” and “are”), thereby reducing the vector size. english and indonesian stop words were used for this study to overcome the use of english in the abstract section and indonesian as the document language. the collection of words from haryalesmana was chosen to be the stop words for indonesian.24 the stemming technique is applied to reduce dimensions that are useful for improving the function of the classification system 25 by changing word forms into basic word,26 e.g., water, waters, watered, and watering into water. this analysis applies wicaksana data to indonesian stemming.27 some words cannot be separated from other words because they form a meaning, e.g., nondestructive testing, biological radiation effects, structural chemical analysis, and water -cooled reactors. to overcome this case, the use of n-grams can help identify compound words that have a meaning so that the words are not reduced.28 n-grams will record a number of “n” words that follow the previous word.29 to accommodate these words, in this study, three words were assigned to n-grams. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 5 santosa figure 1. nuclear taxonomy classification framework. vector creation in this study used tf-idf and binary term occurrence and then compared them to determine the best performance. in the knn method, it is necessary to determine the value of “k” manually, so a value of 2–10 was chosen by activating a weighted vote which is useful for weighing the contributions of neighbors in the vicinity. weight voting indicates the use of multiple voting methods by assigning a weight to each neighbor depending on their distance from the unknown item.30 the types of measurement chosen to get maximum results were numerical measure and tested cosine similarity, correlation similarity, and dice similarity. meanwhile, to measure performance, the author used cross validation with a number of folds of 10. then, using this set of procedures, documents from the batan repository are classified. the procedure that achieves the highest level of accuracy is then submitted as a model. this model was applied to 563 final project documents that have not been labeled so that each document has a label according to batan’s knowledge taxonomy. results the experiment was carried out 54 times to determine the best knn performance from the proposed approach, namely cosine-binary, correlation-binary, dice-binary, cosine–tf-idf, correlation–tf-idf, and dice–tf-idf utilizing cross validation. cosine was still the most accurate in the tf-idf vector creation process, with an accuracy of 81.89 percent on seven neighbors, and dice reaches the lowest point when used on four neighbors. in contrast to correlation and dice, cosine can perform well when creating binary vectors. cosine on four neighbors had the best performance, with a 97 percent accuracy rate. the lowest accuracy occurred when the number of selected neighbors was two and the overall numerical measure had decreased in neighbors more than nine. the classification model for unlabeled documents was determined to be the cosine-binary method with four neighbors. the experiment found that this method did not successfully group three information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 6 santosa documents (for details of the confusion matrix, see appendix a). even though document 7 ought to be on nfcam, but with a lower score of 0.49921, it was predicted on the nrss with a confidence value of 0.50079. documents 86 and 93, which were supposed to be about endf, were unable to be foreseen. document 93 was predicted on the nrss with a confidence value of 0.50126 and document 86 was predicted on the nr with a value of 0.49936. figure 2. a comparison of the accuracy levels in the knn method. this study utilized 563 unlabeled documents that were divided into six years. there were 34 fewer documents in 2021 than there were in 2020, a significant drop from the previous year (see table 1). the number of documents then climbed again in 2022, reaching 98. rapidminer’s labeling process ran into issues when it got to the process document stage. to improve memory performance, the documents were split into three runs (2017–2018, 2019–2020, and 2021–2022) because the memory was not sufficient to execute a set of commands on docu ment processing. the results of the previous set of procedures were then exported as tabular data for further study. every year, the evolution of each nuclear subject can be seen in the final project report (see fig. 3). during the test period, 282 documents (50.09%) of the total extant papers had an endf study, followed by ir with 95 documents (16.87%) and nfcam with 69 documents (12.26%). while there were very little changes between nr and nrss, nr contains 47 papers (8.35%) connected while nrss had 45 documents (7.99%). mgt was the subject with the fewest documents, with a total of 25 (4.44%) from 2017 to 2022. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 7 santosa table 1. the pint’s final project documents growth from 2017 to 2022 study program 2017 2018 2019 2020 2021 2022 grand total electromechanics 35 34 43 35 24 41 212 electronics instrumentation 27 34 38 38 22 28 187 nuclear technochemistry 31 31 26 27 20 29 164 grand total 93 99 107 100 66 98 563 see appendix b for more information on the confidence value of each predicted document. of the 212 final project reports in the electromechanics study program 63.68 percent (135 documents) were projected to be on the endf subject, followed by 17.92 percent (38 documents) on nfcam, nrss with 8.96 percent (19 documents), and nr 5.19 percent (11 documents). meanwhile, ir had the fewest papers predicted, with 2.83 percent (6 documents) while mgt had 1.42 percent (3 documents) predicted. every year, endf was the most predicted subject in this study (see fig. 4). figure 3. nuclear subject development by percentage each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 8 santosa figure 4. nuclear subject development in electromechanics by % each year. the final project report on instrumentation electronics, which included 187 papers, was successfully predicted into five subjects. endf was projected to contain 141 documents (75.40%), nrss was likely to contain 24 documents (12.83%), and nr was predicted to contain 14 documents (7.49%). furthermore, only 7 documents (3.74%) on mgt and 1 document (0.53%) on ir were predicted. nfcam, on the other hand, is not mentioned in any of the electronics instrumentation publications (see fig. 5). final processing was performed on a collection of nuclear technochemistry documents. one hundred sixty-four documents are predicted at ir of 53.66 percent (88 documents), nfcam of 18.90 percent (31 documents), nr of 13.41 percent (22 documents), mgt of 9.15 percent (15 documents), endf of 3.66 percent (6 documents), and the remaining 1.22 percent (2 documents) were predicted on the nrss. subjects that were popular each year vary (see fig. 6) when compared to electromechanics and instrumentation electronics, where endf was the most popular topic in these two study programs. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 9 santosa figure 5. nuclear subject development in electronics instrumentation by % each year. information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 10 santosa figure 6. nuclear subject development in nuclear technochemistry by % each year. discussion the study found that implementing knn with cosine similarity in association with vector construction=binary and k=4 resulted in the highest accuracy results of 97 percent. in general, this strategy outperformed in every class examined, and it can only be balanced on one occasion, notably at k=9 by utilizing correlation similarity. when compared to the use of tf-idf, the results likewise indicated that binary term occurrence always functioned well. tf-idf was only able to achieve its highest accuracy of 81.89 percent when k was 7 using correlation similarity. cosine similarity also seemed to work efficiently on every vector creation, both when using binary and tf-idf (in classes numbering 2, 5, and 10 the use of tf-idf was not optimal), compared to numerical measures of correlation similarity and dice similarity. cosine similarity evaluates the similarity of documents, and a high similarity score indicates that the documents are quite similar.31 nuclear field growth in general, aside from the endf field, which is steady and increasing, other subjects endure annual changes in development. for the past six years, endf has been the most popular subject among students. the endf reached the highest percentage rate in 2022, with 59 documents predicted on this subject. students preferred engineering final project reports on mechanics and structures, electromechanics, control systems, nuclear instrumentation, or nuclear facility process technology. research conducted by wang et al. also suggests that the current popular topic of research on nuclear power is modeling and simulation.32 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 11 santosa the endf document’s average confidence value was 0.6499916, with a median value of 0.7490455. the two documents with the lowest confidence in the endf were document numbers 233 and 597. document 233 had a confidence value of 0.25105 and was predicted in the other three subject areas (nrss, nr, mgt) with close values. likewise, the 597 documents predicted in the endf with a confidence value of 0.25156 were higher than the nrss, nfcam, and ir subjects, but with a not too significant difference. both of these documents can be investigated further and directly evaluated by the librarian in order to obtain a more precise field. the majority of the final project reports projected in the endf have confidence levels around 0.50, and some even higher at 0.75. this study also reveals that 11 documents in the endf category have a confidence value of 1. with lower nrss confidence values, 239 endf documents connected to the nrss field. this relationship demonstrates a good tendency among students conducting nuclear engineering related to the nrss discipline. though it differs significantly from endf, ir is becoming a prominent field. the final project report for ir was developed in 2017–2018, but it shrank again from 2019 to 2021, then increased in 2022. in comparison to other fields, ir has the highest minimal confidence score of 0.4 987, with many documents lying within the 0.5 and 0.75 range. meanwhile, the confidence value for 26 documents predicted by ir is 1. the nfcam subject area is a prediction that appears frequently in ir predictions but has a lower level of confidence. there are 54 documents indicating the existence of research that involves isotopes and radiation in nuclear materials, nuclear excavations, radioactive waste, structures, or advanced materials. nfcam is inversely proportional to the conditions that occur in endf. after increasing in 2019, this subject faced a reversal over the next three years, with only two documents classified in this subject through 2022. students are still uncommonly interested in nuclear minerals, nuclear fuel, radioactive waste, structural materials, and advanced materials. six projected documents in this field have confidence levels of 1, while many more have confidence levels between 0.50 and 0.75. the ir field is also expected to appear alongside the nfcam field publications. there were also ups and downs in nr and nrss. twenty-five of the 47 documents identified on the nr were also predicted with a lower value in the nrss field. this demonstrates that students explored the relationship between the subject of reactor research and safety and security in various documents. meanwhile, only eight of the 46 nrss papers are unrelated to the endf field. this demonstrates that students who study nuclear safety and security tend to perform engineering to address situations involving nuclear safety and security. documents in these two fields are usually concentrated in the 0.5 confidence value range in both nr and nrss. mgt is one of the least studied topics among students. human resources, organization, management, program planning, auditing, quality systems, informatics utilization, or cooperation are more commonly associated with the mgt field. the mgt increased in 2020, although it became the field with the fewest documents on earlier occasions (2017 to 2019 and 2021 to 2022). in terms of confidence value, 21 mgt documents have a value greater than 0.5, with eight documents worth 1. with 10 documents, the endf is the most often discussed study area with mgt. progression in each study program even if they are still within the purview of nuclear science, the growth of the nuclear field in each study program differs depending on the curriculum. students are influenced by knowledge, and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 12 santosa more specifically the process of learning and comprehending (whether theoretical or more practical).33 endf is still the most popular field in electromechanics and electronics instrumentation study programs. these two study programs offer courses in endf topic areas such as mechanical, civil and architectural, electromechanical, electrical, control systems, and radiation detection for nuclear devices. furthermore, the electronics instrumentation study program offers courses on nuclear electronics, signal processing techniques, and practical work on interface and data acquisition techniques, all of which are part of the endf nuclear instrumentation group. apart from endf, the fields of nfcam and nrss have been present in electromechanics for a period of six years. while mgt is currently a less appealing topic, there have been no final project reports relating to mgt in the most recent three years. in electronics instrumentation, the absence of a field occurs in nfcam. the findings of the predictions demonstrate that none of the documents predicted on nfcam were proper. meanwhile, only 10 documents that intersect with nfcam which have lower confidence in the range of values from 0.247 to 0.251. nuclear minerals, nuclear fuel, structural materials and advanced materials, and radioactive waste were not studied in depth in this study program, illustrating why nfcam is not predicted in instrumentation electronics. in contrast to other study programs, ir is the most predictable field in the final project report in nuclear technochemistry. in this investigation, nuclear technochemistry owns 88 of the 95 documents examined. this study program includes ir specializations such as the use of isotopes and radiation in agriculture, health, and industry. radioisotope production becomes another discipline that specializes in the creation of isotopes and radiation sources, which explains why ir is so popular among nuclear technochemistry students. the nfcam field was not present in 2022, despite the fact that it had been the topic of several students’ studies throughout the preceding five years. while the endf and mgt fields have only been present in the last three years, there were no predictable papers in the previous three years. conclusion the trend of research activities carried out by students from one study program to the next appears to vary although they are both within the scope of the nuclear field. for example, the field of endf is quite popular among electromechanics and electronics instrumentation students but not for nuclear technochemistry students because endf only appeared three years ago and the number of documents is still modest. however, endf deserves to be a field that needs attention. nuclear technochemistry students with radiochemistry learning experiences demonstrate that the ir field is linear and interesting to them. due to a paucity of publications, the low proportion in certain categories, e.g., mgt, shows a potential to further investigate this field. this study demonstrates an opportunity to use text mining to assist librarians in performing automatic document classification based on specific subjects. the best model in this study is produced by combining knn with cosine similarity and binary term occurrence. the model used can help improve the quality of decisions made to accurately and efficiently categorize documents. to determine a more specific classification, pay close attention to documents that have a low level of confidence and intersect with other issues. this study is limited to the knn method and information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 13 santosa documents from the batan repository, as well as final project documents for pint students. large-scale testing can be conducted, for instance, in the international atomic energy agency ’s (iaea) nuclear repository known as the international nuclear information system (inis) repository, or in other databases with the complexity of categorizing documents throughout many languages. data accessibility datasets and data analysis code for rapidminer have been uploaded to the rin dataverse: https://hdl.handle.net/20.500.12690/rin/asrgvo. data visualization can be accessed through tableau public: https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknow ledgetaxonomy/story1 https://hdl.handle.net/20.500.12690/rin/asrgvo https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 https://public.tableau.com/app/profile/faizhal.arif/viz/finalprojecttrendsutilizingnuclearknowledgetaxonomy/story1 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 14 santosa appendix a: confusion matrix of 10-fold cross validation accuracy: 97.00% +/4.83% (micro average: 96.97%) true nfcam true ir true nrss true mgt true nr true endf class precision pred. nfcam 13 0 0 0 0 0 100.00% pred. ir 0 18 0 0 0 0 100.00% pred. nrss 1 0 20 0 0 1 90.91% pred. mgt 0 0 0 19 0 0 100.00% pred. nr 0 0 0 0 13 1 92.86% pred. endf 0 0 0 0 0 13 100.00% class recall 92.86% 100.00% 100.00% 100.00% 100.00% 86.67% information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 15 santosa appendix b: the confidence value of each field e n d f ir m g t information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 16 santosa n f c a m n r n r s s information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 17 santosa endnotes 1 budi prasetyo and anggiana rohandi yusuf, “pengelolaan pengetahuan eksplisit berbasis teknologi informasi di batan,” in prosiding seminar nasional sdm teknologi nuklir (seminar nasional sdm teknologi nuklir, yogyakarta: sekolah tinggi teknologi nuklir, 2018), 126–32, https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 . 2 joanna yi-hang pong et al., “a comparative study of two automatic document classification methods in a library setting,” journal of information science 34, no. 2 (april 2008): 213–30, https://doi.org/10.1177/0165551507082592. 3 prasetyo and yusuf, “pengelolaan pengetahuan eksplisit.” 4 jae-ho kim and key-sun choi, “patent document categorization based on semantic structural information,” information processing & management 43, no. 5 (september 2007): 1200–15, https://doi.org/10.1016/j.ipm.2007.02.002; pong et al., “a comparative study”; khusbu thakur and vinit kumar, “application of text mining techniques on scholarly research articles: methods and tools,” new review of academic librarianship (may 12, 2021): 1–25, https://doi.org/10.1080/13614533.2021.1918190. 5 pong et al., “a comparative study.” 6 kiri l. wagstaff and geoffrey z. liu, “automated classification to improve the efficiency of weeding library collections,” the journal of academic librarianship 44, no. 2 (march 2018): 238–47, https://doi.org/10.1016/j.acalib.2018.02.001. 7 manika lamba and margam madhusudhan, “mapping of topics in desidoc journal of library and information technology, india: a study,” scientometrics 120, no. 2 (august 2019): 477– 505, https://doi.org/10.1007/s11192-019-03137-5. 8 fábio figueiredo et al., “word co-occurrence features for text classification,” information systems 36, no. 5 (july 2011): 843–58, https://doi.org/10.1016/j.is.2011.02.002; yen-hsien lee et al., “use of a domain-specific ontology to support automated document categorization at the concept level: method development and evaluation,” expert systems with applications 174 (july 2021): 114681, https://doi.org/10.1016/j.eswa.2021.114681; yousif a. alhaj et al., “a study of the effects of stemming strategies on arabic document classification,” ieee access 7 (2019): 32664–71, https://doi.org/10.1109/access.2019.2903331. 9 david antons et al., “the application of text mining methods in innovation research: current state, evolution patterns, and development priorities,” r&d management 50, no. 3 (june 2020): 329–51, https://doi.org/10.1111/radm.12408; muhammad arshad et al., “next generation data analytics: text mining in library practice and research,” library philosophy and practice (2020): 1–12. 10 mowafy mona, rezk amira, and hazem m. el-bakry, “an efficient classification model for unstructured text document,” american journal of computer science and information technology 06, no. 01 (2018), https://doi.org/10.21767/2349-3917.100016. https://inis.iaea.org/collection/nclcollectionstore/_public/50/062/50062856.pdf?r=1 https://doi.org/10.1177/0165551507082592 https://doi.org/10.1016/j.ipm.2007.02.002 https://doi.org/10.1080/13614533.2021.1918190 https://doi.org/10.1016/j.acalib.2018.02.001 https://doi.org/10.1007/s11192-019-03137-5 https://doi.org/10.1016/j.is.2011.02.002 https://doi.org/10.1016/j.eswa.2021.114681 https://doi.org/10.1109/access.2019.2903331 https://doi.org/10.1111/radm.12408 https://doi.org/10.21767/2349-3917.100016 information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 18 santosa 11 kim and choi, “patent document categorization.” 12 alhaj et al., “a study of the effects of stemming strategies.” 13 mona, amira, and el-bakry, “an efficient classification model.” 14 thakur and kumar, “application of text mining techniques.” 15 kim and choi, “patent document categorization.” 16 wagstaff and liu, “automated classification.” 17 najat ali, daniel neagu, and paul trundle, “evaluation of k-nearest neighbour classifier performance for heterogeneous data sets,” sn applied sciences 1, no. 12 (december 2019): 1559, https://doi.org/10.1007/s42452-019-1356-9. 18 roiss alhutaish and nazlia omar, “arabic text classification using k-nearest neighbour algorithm,” the international arab journal of information technology 12, no. 2 (2015): 190–95. 19 mona, amira, and el-bakry, “an efficient classification model.” 20 guozhong feng et al., “a probabilistic model derived term weighting scheme for text classification,” pattern recognition letters 110 (july 2018): 23–29, https://doi.org/10.1016/j.patrec.2018.03.003. 21 snezhana sulova et al., “using text mining to classify research papers,” in 17th international multidisciplinary scientific geoconference sgem 2017, vol. 17, international multidisciplinary scientific geoconference-sgem (17th international multidisciplinary scientific geoconference sgem, sofia: surveying geology & mining ecology management (sgem), 2017), 647 –54, https://doi.org/10.5593/sgem2017/21/s07.083. 22 lee et al., “use of a domain-specific ontology.” 23 man lan et al., “supervised and traditional term weighting methods for automatic text categorization,” ieee transactions on pattern analysis and machine intelligence 31, no. 4 (april 2009): 721–35, https://doi.org/10.1109/tpami.2008.110. 24 devid haryalesmana, “masdevid/id-stop words,” 2019, https://github.com/masdevid/id-stop words. 25 alhaj et al., “a study of the effects of stemming strategies.” 26 pong et al., “a comparative study.” 27 ananta pandu wicaksana, “nolimitid/nolimit-kamus,” 2015, https://github.com/nolimitid/nolimit-kamus. 28 antons et al., “the application of text mining methods.” https://doi.org/10.1007/s42452-019-1356-9 https://doi.org/10.1016/j.patrec.2018.03.003 https://doi.org/10.5593/sgem2017/21/s07.083 https://doi.org/10.1109/tpami.2008.110 https://github.com/nolimitid/nolimit-kamus information technology and libraries march 2023 exploring final project trends utilizing nuclear knowledge taxonomy 19 santosa 29 kanish shah et al., “a comparative analysis of logistic regression, random forest and knn models for the text classification,” augmented human research 5, no. 1 (december 2020): 12, https://doi.org/10.1007/s41133-020-00032-0. 30 judit tamas and zsolt toth, “classification-based symbolic indoor positioning over the miskolc iis data-set,” journal of location based services 12, no. 1 (january 2, 2018): 2–18, https://doi.org/10.1080/17489725.2018.1455992. 31 hanan aljuaid et al., “important citation identification using sentiment analysis of in -text citations,” telematics and informatics 56 (january 2021): 101492, https://doi.org/10.1016/j.tele.2020.101492. 32 qiang wang, rongrong li, and gang he, “research status of nuclear power: a review,” renewable and sustainable energy reviews 90 (july 2018): 90–96, https://doi.org/10.1016/j.rser.2018.03.044. 33 ronald barnett, “knowing and becoming in the higher education curriculum,” studies in higher education 34, no. 4 (june 2009): 429–40, https://doi.org/10.1080/03075070902771978. https://doi.org/10.1007/s41133-020-00032-0 https://doi.org/10.1080/17489725.2018.1455992 https://doi.org/10.1016/j.tele.2020.101492 https://doi.org/10.1016/j.rser.2018.03.044 https://doi.org/10.1080/03075070902771978 abstract introduction literature review text mining in libraries knn classifier aims of the study methods results discussion nuclear field growth progression in each study program conclusion data accessibility appendix a: confusion matrix of 10-fold cross validation appendix b: the confidence value of each field endnotes local hosting of faculty-created open education resources: launching pressbooks communication local hosting of faculty-created open education resources launching pressbooks joseph letriz information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13803 joseph letriz (jletriz@dbq.edu) is the electronic systems librarian, university of dubuque. © 2022. abstract rising costs of secondary education institutions, coupled with the inflated cost of textbooks, have forced students to make decisions on whether they can afford the primary materials for their classes. publishers working to supply digital access codes, which limit the ability of students to copy, print, or share the materials, or resell the textbook after the course is over, have further pushed students into forgoing purchasing materials. in recent years, institutions have moved to support oer (open education resources) initiatives to provide students a cost-free primary text or supplement to their materials. this allows students unfettered access to quality resources that help drive engagement in courses, from homework to discussions. while larger institutions or in-state partnerships with resource sharing consortiums, such as the mnpals cooperation with the state of minnesota, provide access to platforms like pressbooks, smaller institutions and private colleges don’t always have the ability to negotiate these types of relationships. in this case study, i will cover the foundations necessary to start a low-cost, self-hosted solution to support faculty creation of oer material and the available resources that the university of dubuque utilized in their development process. this overview will briefly cover the skills and knowledge needed to support the growth of this initiative with minimal complexity and as little jargon as possible. introduction at the university of dubuque, the library installed, configured, and deployed an instance of pressbooks to support faculty development of open education resources (oer). the university of dubuque is a small, private university with a total full-time enrollment (fte) of about 2,000. two library personnel lead the deployment of the resource. as many universities find themselves grappling with an increase in textbook costs and other barriers to students’ access to quality information, libraries have emerged as a natural partner within institutions to identify, curate, and provide access to quality oer. okamoto points towards a variety of ways that libraries have managed this, including the community college consortium for open education resources (cccoer), which includes “150 member colleges … promot[ing] oer adoption to enhance teaching and learning.”1 braddlee and vanscoy state that librarians hold an important role in “supporting faculty and students in expanding the range of oer” through a number of methods, referencing prior research that okamoto performed.2 from a number of interviews from similarly sized liberal arts colleges, schleicher et al. state that librarians leading the initiatives for oer “may need technical skills … to assist faculty in developing oer projects.” 3 the benefits for students in terms of cost alone show that oer supported projects, such as the launch of pressbooks at the university of dubuque, has longstanding benefits for faculty, students, and the library.4 mailto:jletriz@dbq.edu information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 2 pressbooks is an open-source book content management system, making the software free for anyone to utilize, customize, and remix. with an open-source software as the basis for this project, the university of dubuque could view and change any of the underlying codebase to fit their exact needs to provide a platform for faculty to publish and develop oer content for their classes and students. the overlaying interface and configuration of pressbooks is built upon a fresh installation of the wordpress blog hosting system utilizing the multisite feature. these two systems are free to install, configure, and deploy on a locally hosted or cloud-based network. larger consortiums, which can consist of state level organizations, universities, and partnerships with businesses, may have the flexibility in spending to fund a hosted solution from the company itself. the cost of paying for a hosted solution can vary depending on the needs of the community served. a pressbooks edu single network plan, hosted by pressbooks, can cost $7,000 a year for the silver plan or $14,000 a year for the gold plan.5 at the university of dubuque, we opted for the low-cost solution of locally hosting our installation, which involved configuring the software locally and providing our own support for the faculty and students utilizing it. in this case study, i will detail how we successfully deployed the instance of pressbooks for the university of dubuque. this case study will cover the documentation used, the systems and services utilized to support the network, and the timeline from beginning the project to its successful launch. documentation to start the installation process, there needs to be a web server to host the pressbooks instance. at the university of dubuque, we used an already configured amazon web services (aws) account to set up the server that pressbooks would run on. aws offers a variety of tiers for its web server hosting, from the smallest available configuration that can be used for free to larger, more powerful instances for public access. at the university of dubuque, we opted for the aws t3a.large instance type, which gave us access to a faster server load for processing the installation and running the instance operations, as well as better network bandwidth.6 once we had the instance type selected, aws allowed for configuration of a variety of operating system (os) installations that come preconfigured or an à la carte option. we chose the same os platform that we utilize for our digital repository, a c-panel, centos 7 instance, as we already owned an educational software license for it. c-panel offers a reduced cost, education license available for any institutions with a .edu domain. the application to receive an educational license for the c-panel account takes little time to fill out and the only cost associated with the initial application is a $30 processing fee.7 once c-panel activated the education license on our primary platform, the license was utilized on the other instances without having to worry about multisite or platform license fees. aws launches the instance in the ec2 services page listed under the account, which details the instance’s setup, volumes attached to the instance that the software gets installed on (with additional volumes available to add onto the instance if necessary), and the ability to create snapshots of the instance for backups and restoration of the installation configuration. aws categorizes volumes as the primary storage devices for the installation, akin to a virtual hard drive, while the snapshots function as a copy of that storage device. during the configuration process, aws provides additional information about all of the options available in their ec2 service. as these are not directly relevant here, i will not go into detail about them. at the university of dubuque, we had preconfigured security groups and snapshot schedules set up that we applied to the pressbooks instance before we installed the underlying software. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 3 the primary documentation used for the platform setup before installing the platform came from the pressbooks documentation site.8 the documentation begins by directing users to wordpress and their famous 5-minute install. wordpress documents are available on their website (https://wordpress.com/support/); installation directions include prompts that guide users through the entire setup and configuration process. once the wordpress installation process is complete, pressbooks can be set up on top of the wordpress site by following the installation directions from the pressbooks documentation. the beginning portion of configuring the wordpress site for pressbooks involves editing the configuration file for wordpress to allow for multisite setups on the single instance of wordpress. once the pressbooks site is installed, pressbooks will require additional plugins through wordpress in order for pressbooks to function correctly. again, the installation documentation for pressbooks walks through each of the necessary plugins, providing directions on how to configure the files for the installation to work correctly, how to start the configuration of users and appearance, and how to begin the creation of digital materials on the site. access to pressbooks can be set up through the installation itself, using plugins to link the installation to microsoft office accounts, google accounts, or any others that might be used. once this final step is completed (which will vary institution by institution depending on what service the institution utilizes for their primary authentication method ), the pressbooks site is ready to be utilized. there are two kinds of regular maintenance needed to keep the installation up to date. the first relates to the pressbooks and wordpress installations and updates, changes to configurations, additions, or deletions for the instance. most of these software updates, configuration changes, and plugin updates are handled through the pressbooks interface under the network manager administrator menu. since pressbooks is a layered software that’s built on the wordpress platform, all of the network configuration options use the same wordpress tools and user interface. the second kind of maintenance is done through a terminal command-line interface (cli) connection to the aws instance. this includes server maintenance tasks, which can be preconfigured through a script run on the server or handled by an administrator with sufficient knowledge of the system. the cli can also locate the error logs to pinpoint any errors that may have happened during setup and configuration. this maintenance can run on a monthly schedule, usually to ensure that web hosting software or internet access services are running correctly on the aws instance in addition to any server updates for the os and installed platform. at a smaller institution, the work on pressbooks can be handled by a librarian or professional staff member, as wordpress makes the procedure as simple as possible for anyone. the command-line interface work, if an os is installed without a user interface, can be handled by either a librarian familiar with terminal commands or a member of the institution’s help desk or information technology support personnel. any additional dependencies outside of what comes with wordpress are easily handled through the same network manager administrator menu. most installs include a number of default configuration options, such as uploading documents, printing from a pdf, or view functionalities. at the university of dubuque, any additional dependencies were all installed using wordpress and configured on pressbooks without any need to access the server directly. for a smaller institution, this makes the process of approaching a self -hosted solution sustainable over time, as it does not require specialized knowledge of servers to handle pressbooks once it is installed. https://wordpress.com/support/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 4 working with faculty to add materials and address concerns when we launched pressbooks at the university of dubuque (http://pressbooks.dbq.edu) and wanted to showcase how using the platform would be advantageous, we worked with a geology professor who had already created his own textbook for his entry level geology course. the pdf he created was over 170 pages long and included all of the terminology, concepts, and example questions the students would see on the quizzes.9 we worked with the professor to get the original word document of his textbook, complete with his own layout structure, font, and headings, correctly formatted to import into pressbooks. the system manages the import process by utilizing very basic formatting of the document, identifying chapters based on the heading types.10 essentially, the library staff worked with the professor to sanitize the document of all unnecessary formatting, laid out the primary chapter headings in the document using the word heading formats that are supported, and then processed the document through the pressbooks tools for importing. with the primary example uploaded and ready to showcase to the faculty members, our library director began fielding the requests of other faculty members at the university of dubuque.11 the current process for working with faculty involves sending any interested faculty the list of required reading that pressbooks has hosted on their website. this includes materials related to creating the content directly in pressbooks, importing the content from a word document if authors already have something they want to use, and setting up an account as an author on pressbooks. in addition to the geology professor mentioned above, two additional faculty members in vastly different departments, computer information science and philosophy, used our pressbooks instance to curate their materials for their students. as the instance is built on a wordpress multisite installation, library staff are able to install and configure a variety of additional material for faculty—enabling practice quizzes, the list of glossary terms to study, and other material—either through the native pressbooks interface or with the assistance of opensource plugins such as h5p, a plugin that allows community-created videos, presentations, quizzes, and interactive content to be created, shared, and reused. all of these additional configuration options, including adding the additional tools for faculty, are handled through the network manager administrator menu. faculty with questions or needs for assistance can reach out to library personnel directly through email or by setting up a teams or zoom call to walk through the problem they might have or express a need that they can assist with. looking back/reflections throughout the process, the university of dubuque’s work came to fruition through the efforts of one librarian focused on the application and server side management and a library worker who was familiar with mysql query language and data management. this partnership proved invaluable in working with the nuances of configuring the sql database to the necessary specifications. for any institution looking to have an uncustomized database, the wordpress installation configuration options work without any additional knowledge or customization necessary. the library’s access to the aws instance from the campus needed involvement by the campus information technology department’s help desk to approve the ip address on file for the dns configuration. in simpler terms, once the library set up the aws instance with an elastic ip address (the term amazon uses on aws to refer to their ranges of ip addresses) and configured the domain information on pressbooks through the installation, the library provided that information to the help desk and they updated the necessary documents and certificates. the last http://pressbooks.dbq.edu/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 5 piece of the process, inviting users to utilize installation, required the most patience and is an ongoing process. in setting up the accounts locally, for more restricted access, pressbooks provides only temporary account status to any created user accounts. this means if a faculty member has an account created for them by the institution in july but doesn’t attempt to sign in to the account until september, pressbooks will not hold onto that information in the sql database. after a default period of three days, which is customizable through the wordpress configuration options, the username is not retained by the system and the new account creation process has to begin again. there are options to link the installation to a single sign-on system, such as microsoft’s adfs or a program such as shibboleth or google apps. directions for setting up this configuration option are also available as part of pressbooks documentation on their website. at the university of dubuque, having a small fte allows for more time to work closely with a faculty member throughout the oer creation process, as the faculty are more flexible with their time. the current process of creating accounts as needed, on an individual basis, wor ks well when handling limited requests. larger institutions that would utilize this method of configuration might find it easier to streamline the request through a single sign-on system, an authentication method that is automated through an administrator or pressbooks or another program. additional needs after the rollout of the pressbooks site to the campus community, we encountered additional needs for our instance that weren’t configured as part of the base installation of the site. for faculty members registered for an account on the site, pressbooks would allow their account to have basic user access to the features necessary to start creating their oer. however, this did not allow for the usage of a majority of the features that pressbooks offers. part of this disconnect stemmed from the way the accounts were created on our multisite instance. accounts created need to be manually added and confirmed as an existing account on the pressbooks site as an author in order to allow access to the full suite of options available for the oer creation. the other hang-up in access for faculty came from the way pressbooks handles email for new registration, password change information, or any type of communication. prior to june 2020, developers were able to simply connect a wordpress site, or other sites, to gmail using a simple authentication of their account using their username and password. in june 2020, google required users who want to utilize gmail to send emails from a new site, or in this case a locally created instance of a wordpress multisite for pressbooks, to authenticate their account information by authorizing the site through a google developer api, paying for access to the plugin that would allow for configuration of gmail, outlook, or other email providers, or rely on the site maintainers to configure their own email services through the server itself.12 if it is built into the budget of the university to purchase and subscribe to a service provided by a plugin owner, that option works without additional server configuration. our institution, however, was limited in its payment options and was unable to utilize standard forms of payment required by the plugin providers. as such, we are manually reviewing the registration requests for the site and creating accounts on pressbooks on an as-needed basis. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 6 concluding thoughts the university of dubuque’s initial introduction to pressbooks came from attending the library technical conference 2019, held at macalester college in minnesota. while there, representatives from the mnpals consortium walked through the work done between the university of minnesota and the state library system to integrate their instance of pressbooks throughout public library systems and university systems.13 the work done at our institution is at a significantly smaller scale, only being utilized by faculty members at the university and members of the university community, including adjunct professors and professional staff. while work on a consortium level can proceed quickly, as there are multiple parties involved in the creation of the resource, we at the university of dubuque had a small number of people immediately working on the project. the discussion between the two personnel in the library handling the system work and the director of the library took the longest amount of time, followed by a couple of months between contact with pressbooks about pricing, hosting through them, and conversations at the state level attempting to gauge interest from additional parties to partner with. initial conversations at local state conferences, with the larger public institution librarians participating in discussions, didn’t evolve into an actionable plan. from there, the planning for the setup of the aws instance to install wordpress and pressbooks took a month to set up. another two weeks were spent working with the mysql database to customize it to the university’s needs and upload the instructor book used as the pilot upload. from start to finish, seven months passed to the launch and rollout of the product. since the launch of the platform, work has started on identifying faculty who would benefit from using pressbooks, with surveys across the institution to glean insight into what faculty are currently doing, how they and their students can benefit from this, and all the steps involved. with the work done at the university of dubuque, operating as a private, smaller university allowed for more flexibility in our adoption of technology, a more focused approach to introducing new systems to the university at large, and a less bureaucratic approach to seeking approval. in the library, we recognized that we were in a unique position to begin this development and implementation rapidly for the university and took advantage of that. endnotes 1 karen okamoto, “making higher education more affordable, one course reading at a time: academic libraries as key advocates for open access textbooks and educational resources ,” public services quarterly 9, no. 4 (2013): 4. 2 dr. braddlee and amy vanscoy, “bridging the chasm: faculty support roles for academic librarians in the adoption of open educational resources,” college & research libraries (may 2019): 429. 3 caitlin a. schleicher, christopher a. barnes, and ronald a. joslin, “oer initiatives at liberal arts colleges: building support at three small, private institutions,” journal of librarianship and scholarly communication 8 (2020): 16. 4 jennifer snoek-brown, dale coleman, and candice watkins, “from spark to flame, lighting the way for sustainable student oer advocacy framework at a community college,” scholarly communication 82, no. 8 (2021): 2. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 7 5 pressbooksedu, pressbooksedu plans q3 2019, received july 26, 2019, adobe pdf. 6 “amazon ec2 t3 instances”, amazon, last modified september 14, 2021, https://aws.amazon.com/ec2/instance-types/t3/. 7 “educational license application”, cpanel, accessed march 14, 2022, https://input.cpanel.net/s3/edu. 8 “installation.”, pressbooks documentation, pressbooks, last modified february 23, 2022, https://docs.pressbooks.org/installation/. 9 dale easley, “the story of the earth.” dale easley, september 1, 2021, http://www.daleeasley.com/resources/physical/geomain.pdf. 10 “import from a word document.”, pressbooks user guide, accessed march 14, 2022, https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156section-3. 11 dale easley, the story of earth (dubuque: university of dubuque pressbooks), http://pressbooks.dbq.edu/storyoftheearth/. 12 “how to upgrade to oauth2 security for existing google/gmail accounts.”, postbox, accessed september 1, 2021, https://support.postbox-inc.com/hc/en-us/articles/218446767-how-toupgrade-to-oauth2-security-for-existing-google-gmail-accounts. 13 “about the minnesota libraries publishing project.”, minnesota libraries publishing project, accessed september 1, 2021, https://mlpp.pressbooks.pub/about-the-minnesota-librarypublishing-project/. https://aws.amazon.com/ec2/instance-types/t3/ https://input.cpanel.net/s3/edu https://docs.pressbooks.org/installation/ http://www.daleeasley.com/resources/physical/geomain.pdf https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 http://pressbooks.dbq.edu/storyoftheearth/ https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ abstract introduction documentation working with faculty to add materials and address concerns looking back/reflections additional needs concluding thoughts endnotes letter from the editors (december 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | decmber 2023 https://doi.org/10.5860/ital.v42i4.17014 the journal’s editorial board has begun work this fall in two main areas, building on last year’s work on our diversity statement: an updated scope statement and exploration of content areas to match. if you have thoughts about how ital can better support your professional work and career development, please let us know. additionally, we are working with guest editors peter musser and joy dubose on an artificial intelligence-themed special issue that will be published in 2024. we are excited about the articles in the pipeline for this issue and look forward to sharing more details about this issue and its contents next year. in this issue in addition to those articles, we are pleased to present contributions from other voices: • there are three items in the letters to the editor section. • in our occasional editorial board thoughts series, editorial board member mary guillory contributes “drained-pool politics versus digital libraries in u.s. cyberspace,” a discussion of book banning in digital environments. • our regular public libraries leading the way column is by ross hanney, “reorienting collection analysis: cost effective item level analysis and machine learning in public libraries.” this essay highlights an analysis of how a small-town public library can save its community’s residents money. peer-reviewed articles in the current issue are listed here: • “to thine own 3d selfie be true: outreach for an academic library makerspace with a 3d selfie booth,” by alex watson • “towards an open source-first praxis in libraries,” by j. robertson mcilwain • “managing your library’s libguides: conducting a usability study to determine student preference for libguide design,” by julie burchfield and maggie possinger • “using qualtrics xm to create a point-of-use survey to assess the usability of a local implementation of primo,” by matthew black, heather ganshorn, and justine wheeler • “reference chatbots in canadian academic libraries,” by julia guy, paul r. pival, carla j. lewis, and kim groome help keep ital going if you are interested in contributing to information technology and libraries, there are several ways for you to do so. the main one, of course, is through contributing an article. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. for those of you who are interested in contributing in other ways, two opportunities will be available in the first part of 2024. a call for peer reviewers for ital and our sibling core journals (library leadership & management and library resources & technical services) will be distributed in january. for those interested in serving on ital’s editorial board, our annual call for volunteers will be distributed to core members in april. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ital.corejournals.org/index.php/ital/article/view/16988 https://ital.corejournals.org/index.php/ital/article/view/16987 https://ital.corejournals.org/index.php/ital/article/view/16987 https://ital.corejournals.org/index.php/ital/article/view/16987 https://ital.corejournals.org/index.php/ital/article/view/15107 https://ital.corejournals.org/index.php/ital/article/view/15107 https://ital.corejournals.org/index.php/ital/article/view/16025 https://ital.corejournals.org/index.php/ital/article/view/16473 https://ital.corejournals.org/index.php/ital/article/view/16473 https://ital.corejournals.org/index.php/ital/article/view/16475 https://ital.corejournals.org/index.php/ital/article/view/16475 https://ital.corejournals.org/index.php/ital/article/view/16511 https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com in this issue help keep ital going reproduced with permission of the copyright owner. further reproduction prohibited without permission. china academic library and information system: an academic library consortium in china dai, longji;chen, ling;zhang, hongyang information technology and libraries; jun 2000; 19, 2; proquest pg. 66 china academic library and information system: an academic library consortium in china longji dai, ling chen, and hongyang zhang since its inception in 1998, china academic library and information system (calis) has become the most important academic library consortium in china. calis is centrally funded and organized in a tiered structure. it currently consists of thirteen management or information centers and seventy member libraries' 700,000 students. after more than a year of development in information infrastructure, a calis resource-sharing network is gradually taking shape. l ike their counterparts in other countries, academic libraries in china are facing such thorny problems as shrinking budgets, growing patron demands, and rising costs for purchasing books and subscribing to periodicals. it has thus become increasingly difficult for a single library to serve its patrons to their satisfaction. under these circumstances, the idea of resource sharing among academic libraries was born. library consortia provide an organizational form for libraries to share their resources. the georgia library learning online (galileo), the virtual library of virginia (viva), and ohiolink are among the wellknown library consortia in the united states.i traditionally, the primary purpose of establishing a library consortium is to share physical resources such as books and periodicals among members. more recently, however, advances in computer, information, and telecommunication technologies have dramatically revolutionized the way in which information is acquired, stored, accessed, and transferred. sharing electronic resources has rapidly become another important goal for library consortia. i what is calis? in may 1998, as one of the two public service systems in "project 211," the china academic library and information system (ca lis) project was approved by the state development and planning commission of china after a two-year feasibility study by experts from academic libraries across the country. calis is a nationwide academic library consortium. funded primarily by the chinese government, it is longji dai is director, peking university library, and deputy director, calis administrative center; ling chen is deputy director, calis administrative center; and hongyang zhang is deputy director, reference department, peking university library. 66 information technology and libraries i june 2000 intended to serve multiple resource-sharing functions among the participating libraries-including online searching, interlibrary loan, document delivery, and coordinated purchasing and cataloguing-by digitizing resources and developing an information service network. i structure and management of calis a library consortium is an alliance formed by member libraries on a voluntary basis to facilitate resource sharing in pursuit of common interests. whether a consortium can operate successfully depends in large part on how it is managed. calis differs from library consortia in the united states in that it is a national network. it resembles multistate consortia in the united states with respect to geographic distribution of member libraries, but it is like tightly knit or even centrally funded statewide ones in terms of management.2 the calis members are distributed in twenty-seven provinces, cities, and autonomous regions in china, making an entirely centralized management difficult. after surveying some of the major library consortia in the united states, europe, and russia, calis adopted an organizational mode characterized by a combination of both centralized and localized management-that is, a three-tiered structure (figure 1). in order to improve the management efficiency and maximize the sharing of various resources including funds, calis has established a coordination and management network comprising one national administrative center (which also serves as the north regional center), five national information centers (see table 1) and seven regional information centers (see table 2). the thirteen centers are maintained by full-time staff members provided by the libraries in which these centers are located. the national administrative center (located in peking university)-overseen by officials from the concerned office at the ministry of education and the presidents of peking and tsinghua universities and advised by an advisory committee consisting of experts from major member libraries-is responsible for the construction and management of calis, makes policies and regulations, and prepares resource-sharing agreements. the center has an office handling routine management needs and several specialized work groups overseeing calis' national projects, such as those for the development of databases for union catalogues, current chinese periodicals, and calis' service software. under the guidance of the national administrative center, five national information centers are each responsible for building and maintaining an information system reproduced with permission of the copyright owner. further reproduction prohibited without permission. in one of five general areas-humanities, social science, and science; engineering and technology; agriculture and forestry; medicine; and national defense-in coordination with regional centers and member libraries. the host libraries where these centers are located possess relatively abundant collections in their respective areas. these centers, which are intended to be information bases that cover all major disciplines of science, are responsible for importing databases for sharing and constructing resource-sharing networks among member libraries and for providing searching and document delivery services to member libraries. 5 national information centers 8 regional information centers 70 member libraries depending on their location, academic libraries in china are divided into eight groups, with each forming a regional library consortium. each regional consortium is overseen by a regional management center, except for the consortium in the north, which is directly managed by the national management center. the regional centers not only participate in nationwide projects in coordination with the national centers and other figure 1. the three-tiered structure of calis regional centers, but they also are responsible for promoting cooperation among libraries in their particular regions. all the centers are located in member universities and staffed by the host universities. the concerned vice president or library director of a host university is in charge of the associated center. the regional centers also are assisted by regional coordination committees and advisory committees of provincial and municipal officials in charge of education; university presidents; library directors; and senior librarians in the concerned table 2. table 1. five national information centers areas of specialization humanities , social science and science engineering and technology agriculture and forestry medicine national defense location peking university, beijing tsinghua university , beijing china agricultural university, beijing beijing medical university, beijing haerbin industrial university, haerbin, heilongjiang regional information centers and areas of the ir jurisdiction name national administrative center southeast (south) regional center southeast (north) regional center central regional center south regional center southwest regional center northwest regional center northeast regional center location beijing shanghai nanjing wuhan guanzhou chengdu xi'an jilin areas of juristiction beijing, tianjin , hebei, shanxi, and inner mongolia shanghai, zhejiang, fujian, and jiangxi jiangsu, anhui, and shandong hubei, hunan, and henan guangdong, hainan, and guangxi sichuan, chongqing, yunnan, and guizhou shanxi, gansu, ningxia, and xinjiang jilin, liaoning, and heilongjiang china academic library and information system i dai, chen, and zhang 67 reproduced with permission of the copyright owner. further reproduction prohibited without permission. regions. these committees serve a coordinating role in the regions. i funding the development and operation of calis has been funded in large part by the chinese government. the sources of funding for calis at the present time are as follows: • government grants. much of the funds for the calis project during the first phase of construction came from the government. because of the demonstrated benefits of the ongoing project, it is expected that the government will provide funds for the second phase of calis construction. these government funds have been used in the purchase of software and hardware for the calis centers and commercial databases, development of service software and databases, training of staff members, etc. • local matching funds. according to prior agreements, a province or city that desires to have a regional center is required to provide funds in supplementation to the government funds for the construction of its local center. • member library funds. these funds, primarily derived from the university budgets, have been used to purchase electronic resources and cover the expenses incurred from the use of the calis service software platforms. although calis is currently funded by the government, the future expansion and operation of the system is expected to rely in large part on other sources of fun_ds. the funding needs for calis may be met by operating the system in a commercial mode. i principles for cooperation among members the successful operation of a library consortium clearly depends on good working relationships among members and between members and the consortium. at calis, all members are required to adhere to a set of principles (see below) in dealing with these relationships. it is based on these principles, known as the calis princ~ples for cooperation among members, that calis pohc1es and rules are made. • the common interests of calis are above those of individual member libraries. 68 information technology and libraries i june 2000 • • • • member libraries should not cooperate at the expense of the interests of others. calis provides services to member libraries for no profit. member libraries are all equal and enjoy the same privileges. larger member libraries are obliged to make more contributions. i what has been achieved? when it was first established, calis had sixty-one member libraries from major universities participating in "project 211." later, as many other major universities were interested in joining the alliance, the number of calis members has climbed to seventy. at the present, calis serves about 700,000 students. construction of calis is a long-term, strategic undertaking. the system provides service functions as they become available and is constantly being improved in the process. in the first phase (1998 to 2000) of the project, calis successfully started the following information-sharing functions in its member libraries: • primary and secondary data searching; • interlibrary borrowing and lending; • document delivery; • coordinated purchasing; and • online cataloguing. the following tasks have been completed: • purchase of computer hardware (e.g., sun e~s00); • construction of a cern etor internet-based information-sharing network connecting academic libraries across the country; and • group purchase of databases, such as umi, ebsco, ei village, inspec, elsevier, and web of science, that are shared among member libraries either directly online or indirectly through requested service/ document delivery. calis also has completed development of a number of databases, including: • • union catalogues. these databases currently contain 500,000 bibliographic records of the chinese and western language books and periodicals in all member libraries. dissertation abstracts and conference proceedings . these databases now contain abstracts of doctoral dissertations (12,000 bibliographic records) and proceedings of national and international conferences (8,000 records) collected from more than thirty reproduced with permission of the copyright owner. further reproduction prohibited without permission. memb er librarie s. the databa ses are expected to ha ve 40,000 record s in total by the end of 2000. • current chinese periodicals. th ese databases (5,000 titl es, 1.5 milli on bibliographic records) cont ain cont ents and indexes of current chinese pe rio dicals from about thirt y member libraries. • key disciplines databases. calis has sponsor ed the de ve lopment of twe nt yfiv e di scip line-sp eci fic d a tabases by m em ber librarie s. each of thes e dat abc.ses contains about 50,000 to 100,000 record s. the first three class es of databases are prepared in the usmarc, un imarc, or ccfc format for the ease of u se b y patron s and ca ta loguing s taff and in data exchang e. clients from member librari es may perform a web-ba sed sear ch of th e above databa ses. most of th em contain secondary docum en ts and ab str acts, and access calis onl ine resources using brows ers. deve lopm ent of sofhvare platform s includes the following: • cooperative online cataloguing systems. the syst ems includ e protocol 239.50-based searc h and upl oad in g serve rs and terminal softw are platforms for cataloguing staff. acquisition and ca taloguin g staff in each memb er library m ay participate in cooperativ e online cataloguing using the terminal sof tware platform s on their local sys tem . th e sys tems have been u sed for the devel opment and operation of the union catalogue databa ses. • systems for database development. these syst ems can be used in the de velopment of shared databa ses containing secondary data informati on in usmarc, unimarc, ccfc, or dublin core format. the systems for dat abase developm ent in the usmarc, unimarc, or ccfc format s are equipp ed with a search server based on the 239.50 protocol to permit use by catalo guing staff and for data exchange . • a n interlibrary loan system. the sys tem, d eve loped base d on the iso10160/10161 protocol, consists of ill protocol machines and clien t terminal s. these sys tems, locat ed in memb er libr aries, are interco nnected to form a calis interl ibrar y loan n etw ork. primar y document deliv ery sof tware bas ed on the ftp protocol also has been developed for the de livery of scann ed docum ent s between libr aries. • an opac system. the system has both web /239. 50 a nd web / ill ga teways . patron s may visit the system using co mmon brow sers , sea rch all calis new! lita publications getting the most out of web-based surveys by david ward • 2000 $20 ($18 lita members) isbn 0838981089 surv eys help evalu ate user service s, rat e diff e r e nt librar y programs, facilitat e n ee ds assess m ents , a id fa cul ty research , a nd mor e. posti ng surv eys to the w eb provide s an easy and conveni en t way to reach in ten ded aud igetting the most out of web-based survey s enc es, cen tralizes data collection a n d gives librari a ns gre ater contro l over analyz ing and repor ting results . thi s guide shows ho w to create r ob u st w eb-ba se d sur veys, a nd t h e n gather a nd ass imil ate t h e ir da ta for u se in common database a nd spre adsh eet programs. th e auth or h as applied th e techniques described in hi s own work and has desi gned both comm ercial and acad emic web sites . digital imaging of photographs: a practical approach to workflow design and project management by lisa macklin and sarah lockmiller• 1999 $20 ($18 lita members) • isbn 0838980058 a com pre hens ive app roach to man agement of digit al im ag ing in libr aries a nd archi ves , from archival nega tives to metadata ca taloging a nd web -base d access. getting mileage out of meta data: applications for the library by jean hudgins, grace agnew, and elizabeth brown 1999 • $22 ($19.20 lita members)• isbn 0838980066 an overview of the state-of-the-art metadata cataloging and curr ently ava ilabl e metadata standa rds, incl uding compr ehen sive descr iption s an d links to current a pplications. other lita publications and a printable order form can be found at www.lita.org/litapubs/index.html. fax orders to (312) 836-9958 or call 800-545-2433, press 7 (m-f, 8-5 cst). china academic library and information system i dai, chen, and zhang 69 reproduced with permission of the copyright owner. further reproduction prohibited without permission. databases, and send search results directly to the calis interlibrary loan service. patrons also may access an ill server through web/ill, tracking the status of submitted interlibrary loan requests, inquiring about fees, and so on. the databases that are centrally located and those that are distributed at various locations as well as service platforms in member libraries form a calis information service network. i future considerations in a period of just over a year, considerable progress has been made in forming a nationwide resource-sharing library consortium in china. however, because member libraries vary in size, available funds, staff quality, and automation level, calis has yet to realize its potential. there are a number of problems that remain to be solved. for example, the calis union catalogue databases do not work well on some of the old automation systems in member libraries and the calis service platforms are incompatible with a dozen automation systems currently in use; as a result, the union catalogues cannot tell the real-time circulation status in all member libraries, affecting interlibrary loan service. in addition, primary 70 information technology and libraries i june 2000 resources are not sufficiently abundant. therefore, the extent to which resources are shared among member libraries remains quite limited. in the next phase of development, calis will improve service systems (including hardware and software platforms) and the distribution of shared databases. at the same time, calis will develop more electronic resource databases and be actively involved in the research and development of digital libraries, expanding the scale and extent of resource sharing. references 1. barbara a. winters, "access and ownership in the 21st century: development of virtual collection in consortia! settings," in electronic resources and consortia (taiwan: science and technology information center, 1999), 163-80; katherine a. perry, "viva (the virtual library of virginia): virtual management of information, in electronic resources and consortia (taiwan: science and technology information center, 1999), 93-114; delmus e. williams, "living in a cooperative world: meeting local needs through ohiolink," in electronic resources and consortia, ching-chin chen, ed. (taiwan: science and technology information center, 1999), 137-61. 2. jordan m. scepanski, "collaborating on new missions: library consortia and the future of academic libraries," in proceedings of the international conference on new missions of academic libraries in the 21st century, duan xiaoqing and he zhaohui, eds. (peking: peking univ. pr., 1998), 271-75. a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic article a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic matthew black and susan powelson information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13209 matthew black (mblack@ucalgary.ca) is the discovery & systems librarian, university of calgary. susan powelson (spowelso@ucalgary.ca) is the associate university librarian, technology, discovery & digital services, university of calgary. © 2021. abstract in the spring of 2020, as post-secondary institutions and libraries were adapting to the covid-19 pandemic, libraries and cultural resources at the university of calgary rapidly implemented ex libris’ reading list solution leganto to support the necessary move to online teaching and learning. this article describes the rapid implementation process and changes to our reserve reading list service and policies, reviews the status of the implementation to date and presents key takeaways which will be helpful for other libraries considering implementing an online reading list management system or other systems on a rapid timeline. overall, rapid implementation allowed us to meet our immediate need to support online teaching and learning; however, long term successful adoption of this tool will require additional configuration, engagement, and support. introduction in response to the changes to the post-secondary learning environment due to covid-19 and to better integrate our course reserve reading list services with our library management system (ex libris’ alma), libraries and cultural resources (lcr) at the university of calgary (ucalgary) decided to rapidly implement ex libris’ reading list solution leganto. after rapidly implementing this reading list solution, lcr made it accessible to instructors and students in our learning management system, desire2learn (d2l), for the fall 2020 term. this article will discuss lcr’s decision to rapidly implement leganto, the implementation process, changes to our reserve reading list service and policies, and will conclude by reviewing the implementation so far and present key takeaways. this paper will be helpful for those who are considering implementing an online reading list management system in general but more so for those looking to do so on a rapid timeline. it will also be helpful for those implementing new systems to support changes to services due to covid-19. literature review online reading lists management systems have been in use since the early 2000s. from 1999 to 2000, loughborough university developed and implemented an in-house open-source reading list management system which “was an electronic representation of the academic’s paper-based reading list.”1 since then open-source and commercial solutions have been developed and implemented by many libraries. richard cross summarizes the development and growth of the market for “resource list software” in the uk and notes that “in the absence of a mature commercial market, several uk universities have developed in-house” systems. cross notes that since 2010, commercial solutions have been developed and that “the high-level specifications requirements” for reading list management systems have now been distilled.2 as part of the development of the commercial market, ex libris launched its leganto online reading list solution in 2015; the solution has since been implemented by over 230 institutions worldwide.3 mailto:mblack@ucalgary.ca mailto:spowelso@ucalgary.ca information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 2 overall, the literature on reading list management systems focuses on reviewing implementations and identifying factors that contribute to successful implementation of these solutions.4 also, some literature captures instructor and student perceptions of these systems and provides recommendations for successful engagement and use.5 these recommendations focus on the importance of administrative support for the tool, updating or creating policy and workflows to support implementation, technical configurations and integrations, and faculty/instructor engagement. 6 libraries that have implemented an online reading list system indicate that it is important to have the adoption of the system supported and championed by senior administration within the library and from the wider institution.7 having this support means that reading list policies and services can be aligned with library and institutional goals to support teaching and learning. furthermore, marie o'neill and lara musto contend that successful adoption is dependent on this support and integration with institutional goals and not just “premised on technology.”8 establishing policies and service goals that are supported by senior administration provides an impetus to make sure workflows and functionalities are configured to achieve these. the literature recommends that implementing an online reading list solution provides an important opportunity to review or revise previous workflows or to develop new workflows for standardization and “timely satisfaction of resource list needs.”9 to support these new workflows and services, technical integrations and configurations need to be considered. these include integrations with the library management system, the learning management system (lms), and institutional authentication systems.10 these integrations are essential to make the system streamlined and accessible for instructors and students and are expected by these user groups. for example, when the university of manchester library was implementing leganto in 2019, they surveyed students and instructors and found that students valued convenience and access, and that instructors were interested in a system that would allow them to order books and digitized chapters and see analytics.11 thus, the reading list solution should be configured and implemented with functionalities and integrations to support these expectations. ease of access within the lms is perceived as is an important technical requirement for successful implementation. o’neill and musto’s study finds that faculty had a strong desire to have a reading list system integrated with the lms.12 meredith gorran farkas contends that “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”13 murray and feinberg address the placement of libguides in the learning management system stating it is critical to make resources available despite where students are physically located.”14 this can be extrapolated to reading lists, and in this environment, where students are learning online in geographically dispersed locations, the lists need to be easily available. faculty and student engagement are important nontechnical considerations when embedding library services in the lms. knowing and including relevant stakeholders, gaining instructor buyin and understanding of the benefits, and determining how to raise student awareness of the tool, both short and longer term, are key elements.15 the literature recommends providing instructors with resources such as templates and training “at specific points in the semesters when faculty have less time pressures.”16 however, engagement should not be limited to this: it should be information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 3 ongoing and collaborative. this ongoing engagement can work to create “a virtuous circle” in which instructors “see return on the investment in resource list work” and “improved student satisfaction.”17 one university reports this can provide an opportunity for collaboration and generate “word-of mouth” to promote adoption.18 what is leganto? leganto is a cloud-based reading list solution fully integrated with ex libris’ alma library management system. it allows reading lists to be processed directly in alma by library staff and provides an interface for creating and engaging with readings lists that can be integrated directly into learning management systems using the learning tools interoperability (lti) integration. instructors can use leganto to create reading lists by searching for and adding resources. leganto allows diverse resources to be added to reading lists, including physical and electronic resources that are in alma, ex libris discovery index resources (via their central discovery index), internet sources (any url added manually or through the cite it! browser bookmark tool), or uploaded resources (documents uploaded by the instructor). university of calgary context the university of calgary is a comprehensive research university, ranked one of canada’s top ten research universities.19 the university is home to 14 faculties (offering more than 200 academic programs) and more than 33,000 students are currently enrolled in undergraduate, graduate, and professional degree programs. d2l is the university’s learning management system, managed by the university’s teaching and learning unit. libraries and cultural resources is a principal division of the university of calgary and includes eight libraries on campus and across the city, and two art galleries. the main library is the taylor family digital library (tfdl) which opened in 2011. in 2018, lcr adopted ex libris’ alma as its library management system. pre-covid-19 reserve reading list service prior to covid-19, lcr had distinct and unintegrated systems and processes for managing course reserves and reading lists. while some functions in alma were used to manage physical course reserves, most of the workflows were managed outside of alma. these included a web form that instructors could use to submit requests for course materials and a course reserve tool (atlas systems’ ares product). the submissions from the web form would be reviewed by our copyright office who would determine if the requested item needed copyright clearance or not. through this process, physical items which were already in the library collection were flagged and sent to fulfillment staff. staff would create or update the course in alma, add the item to a course reading list, and move it to a reserve location so it could be borrowed by students. doing this also made the items searchable through our course reserve search in our discovery service (primo). requested items that were not in the library collection were sent to the collections department for purchase consideration. through email communications with the copyright office, instructors could also request parts of items be scanned and approved for use in classes. requests for electronic reserves were not managed in alma. how did covid-19 affect our reserve reading list service? shortly after the covid-19 pandemic became widespread in the province of alberta, the government implemented restrictions which required post-secondary institutions to close physical locations and to restrict or move most courses and services online. in march 2020, lcr had to close all physical locations and was only able to provide online access to resources and information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 4 services. from march to august, lcr focused on online services and promoted access to online resources. the main taylor family digital library was not able to open until august, when we began offering contactless pickup service and limited study space bookings. as of july 2021, the majority of the courses offered by the university continue to be online. as a result, we are not fulfilling instructor requests to make print/physical resources accessible to students through course reserve. this has been a significant change from our previous operations in which physical course reserve was a popular and successful service. in 2019, lcr had a total 3,428 physical items added in alma as course reserve citations and these items circulated 16,345 times. with the announcement that fall 2020 term courses would be predominantly online, and the anticipation that the same would hold true for winter 2021, we realized that a new mechanism to deliver course reserves would be necessary. while we had been considering ex libris’ course reading list tool before the pandemic, the operational changes due to covid-19 provided us with an urgent need and impetus. we had to quickly develop alternative ways to support instructors in creating reading lists and for students to access reading list resources in the online learning environment. leganto rapid implementation leganto implementation is managed by ex libris and is typically done over an eight-to-twelveweek schedule. during this implementation, institutions work with ex libris to set up configurations to meet local needs. in 2020, ex libris began offering rapid implementation for leganto, which involves a shortened timeline of approximately four to five weeks. this is achievable because institutions allow ex libris staff to set up the leganto now configuration, which focuses on configuring essential features to quickly get the tool up and running. there are inherent tradeoffs between the typical and rapid implementation that we needed to consider. the shortened timeline would allow us to launch the service sooner and begin promoting it as a tool to support faculty in moving to teaching online due to covid-19. in addition, because we would implement leganto out of the box we could use the vendor created videos and support tools rather than creating our own. the additional three to seven weeks required for the typical implementation would have provided us more time to fine tune and configure the tool to meet our specific needs but would have delayed our ability to support our faculty in this time of need. for example, the rapid implementation schedule did not include support for setting up workflows for digitization requests or configuration of the more advanced copyright clearance functions to improve automatic processing. to configure these, we would need to work on them on our own while going through the rapid implementation or once the implementation was complete. after considering these tradeoffs, in april 2020 we made the decision to rapidly implement leganto. this decision was supported by the provost and lcr senior administration providing us with the institutional support identified in the literature as necessary for a successful implementation. table 1 outlines the rapid implementation schedule we followed. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 5 table 1. project implementation schedule event dates kickoff may 27, 2020 implementation may 28 to june 28, 2020 go-live june 29, 2020 switch to support july 15, 2020 the implementation portion involved weekly meetings to review the project status, frequent posting to the online project management tool to update the status and ask clarifying questions, and many internal meetings to discuss our progress and to make decisions about workflows, configurations, and engagement. as a requirement for go-live, at least one reading list had to be created by an instructor and be accessible to students. ex libris offered the opportunity to implement together with the university of manitoba, which is a similarly sized canadian university. this was a great opportunity to not only start implementation sooner but also connect with the university of manitoba. in the short term, rapid implementation would help us meet an immediate need to support online access and learning while operating under covid-19 restrictions. in the long term, this was an opportunity to develop a new way to engage with instructors and further promote our collections as resources for supporting learning. also, it was the opportunity to revise or develop workflows to support these goals. overall, we hoped leganto would make it easier for instructors to add resources to reading lists and to get copyright clearance so these resources would be accessible to students through a standardized tool integrated with d2l. with these goals in mind, we aimed to • pilot leganto for summer 2020 term, • have leganto accessible for all courses by fall 2020 term, and • revise or develop workflows for digitization, copyright clearance, and purchase requests. implementation work despite the out-of-the-box leganto now configuration which aimed to have leganto up and running for our go-live date, implementation still required the lcr team to do a significant amount of work and planning. to manage this work, we established a technical team and an engagement team. the technical team, comprised of representatives from the library’s systems and discovery, copyright, digitization and collections units, the university’s teaching and learning unit, and the associate university librarian, technology, discovery and digital services (aul), worked to review, revise, and develop workflows and to test configurations in alma and leganto. the engagement team, created by a call for participants and comprised of three subject librarians and the aul, worked to develop strategies, communications, and resources to promote and support the use of leganto to instructors. also, the two teams collaborated to test configurations and functions and suggest improvements. the aul’s presence in the teams was critically important to demonstrate senior leadership support of the project, an important element for success noted by o’neill and musto.20 overall, this work required the teams to meet and discuss short-term and long-term changes to our course reserve service and how leganto could be configured to support these. the rapid information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 6 implementation schedule made this challenging because we had to start with the leganto now configurations and then test these to see how they aligned with our desired services and workflows. if they did not align well, we had to investigate and adapt. this was a back-and-forth negotiation as we learned in a short time frame how to configure the system to support our desired workflows and services and how to adjust our services and workflows to align with the capabilities/functionalities of the system. an important part of the implementation was configuring leganto to use the learning tools interoperability (lti) standard to integrate with d2l. as mentioned in the literature, having the online reading list solutions embedded and integrated with the institutional lms is a key factor for successful adoption of the tool by instructors and students.21 using the lti integration, we were able to connect alma, leganto, and d2l. this work required coordination with the adminis trators of d2l at the university so that course and student data could be communicated between d2l, alma, and leganto. after configuring and testing the lti integration, we decided to use it to create reading list (leganto) links in d2l courses through the d2l tools menu. when a user clicks the link, user and course data from d2l is sent to alma to • create the course based on the course information and • assign the appropriate role in alma and leganto for the instructor or student. since lcr could not provide physical reserves because of covid-19, we decided to support the full/partial digitization of physical resources based on copyright approval and the creation of purchase requests for electronic copies of physical items added to a reading list. to achieve these goals, we needed to revise and establish workflows that make use of the basic digitization and purchase request functions in leganto and alma. these functions were not part of the leganto now configuration, but we decided to take the opportunity to make them available to instructors. for our purposes, the digitization workflows and functions in alma and leganto would be used to support the full or partial scanning of physical resources. we already had workflows to support this type of work, but these needed to be revised so requests created by instructors in leganto could be reviewed by the copyright office, items could be retrieved from the collection and scanned by staff, and scans could be made accessible in leganto for instructors and students. the technical team worked with the departments involved in these workflows to determine how the new functions could support this work and how the workflows needed to be adjusted. also, we had to decide how to use the functions in alma and leganto to facilitate electronic purchase requests for print/physical resources added to readings lists. similarly, we already had workflows for this but needed to review and revise these to make use of the functions supported by alma and leganto. using these functions, we configured alma and leganto so that if an instructor added a book to their reading list and we did not have an e-book version, a purchase request for an e-book version would be submitted to the collections department. this was achieved using tags and automatic processing rules with definable parameters. there were a few other settings we had to customize to meet our needs. this included configuring the default out-of-the-box processing and copyright statuses for citations added to reading lists, reading list visibility settings to make sure only enrolled students could access the lists, user interface adjustments to control what functions are available to students and instructors, and interface text/messaging changes to align with our services. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 7 the other major part of this work involved engagement to publicize the new tool and provide training and support to instructors and library staff. the engagement team worked to develop resources and provided training to staff and instructors. library staff were oriented to the product, the key benefits, and how to add course materials so that they could speak knowledgably to their faculty about the tool. faculty training sessions introduced faculty to the tool and how to add specific types of resources, for example a book chapter, a website, or an item from their personal collection. one libguide was created to support both faculty and staff. ex libris provided support to the engagement team by providing communication templates and advice on engagement strategies. the team also promoted the new tool in our institutional newsletter and worked with the university’s teaching and learning unit to promote it. overall, it was challenging during the five-week rapid implementation to quickly map and adapt our current course reserve workflows and ensure that these configurations will meet long term needs while considering the restraints of the current covid environment. go-live and post go-live we finished implementation on june 29, 2020. for the summer term pilot, we only had one instructor publish a reading list for their course. however, instead of using the functionalities to add citations to the list the instructor uploaded a document version of their reading list. this was similar to what instructors were used to doing in d2l and we had to support the instructor to add the citations from the document to the reading list in leganto. shortly after going live, we had to make the lti link to leganto available in all courses because after publicizing the new tool, we received requests for access from instructors who were preparing for their fall term courses. this was sooner than expected and we believe this was because the online fall term may have motivated instructors to start preparing earlier than usual. with the end of the fall 2020 term, we have been able to use alma and leganto’s analytics reporting to see instructor use and student engagement with reading lists (see table 2). from these reports, we can see that for the fall 2020 term, there were 50 reading lists created that were associated with a course and that had at least one citation added. this means the instructors of these courses at least tried to use the reading list tool. however, of these 50 lists, there were only 30 lists that had student activity. student activity is a category of interactions that ex libris uses to indicate how well the lists are being used by students and includes activities such as reading list views, number of citation views, number of full text views, number of files downloaded, and number of students that have marked a reading as done. we surmise that for the other 20 reading lists, the instructors encountered barriers to list creation and abandoned the process. we continued to use the analytics reporting to monitor the status of instructor use for the winter 2021 term. by the end of that term, there were 66 lists created with at least one citation. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 8 table 2. fall 2020 and winter 2021 term usage term reading lists with at least one citation added number of reading lists with student activity students viewing reading lists reading list views fall 2020 50 30 664 5,320 winter 2021 66 49 1,154 13,200 takeaways rapidly implementing leganto during covid-19 has been a valuable and challenging learning experience which has offered several takeaways. go through rapid implementation with another university of a comparable size so that you can connect, support, and learn from each other. rapid implementation is an intense period of uncertainty which can be challenging to go through alone. another institution can provide support and help the implementation team cope. for example, during the implementation we had meetings with the university of manitoba to discuss our progress and challenges with the implementation before meeting with the vendor implementation team. since implementing we have stayed in contact and have been able to rely on each other to discuss the status of our engagement, adoption, and configurations. rapid implementation requires effort and time. the shortened implementation timeline imp lied that it would be relatively easy; however, the project dominated the schedules of those involved and has continued to require work. the weekly meetings with ex libris, the weekly internal meetings, the work in between meetings, and the continuous testing and reworking settings required a significant time commitment from all involved. timing is everything. we had hoped for a successful pilot in the summer term and to use this experience to learn and prepare for the wider rollout in the fall term. however, instructors began preparing for the fall term earlier than expected and consequently we needed to make leganto accessible earlier. understand your instructor timelines and practices around course preparation. the rapid implementation allowed us to respond to this unexpected pressure and begin to support instructors who wanted to make use of the tool. finally, further customizations and integrations will be necessary because of the nature of the rapid implementation and the inherent trade offs. the rapid implementation did not provide the time for us to pursue these customizations and integrations, and this is typical. for example, the course data integration would have required us to coordinate with our central it department and registrar to get approval and resources to build a data source and scripts to export and format the data so it could be loaded by the tool. normal implementation would have provided support and time for this. interestingly, sheedy, wells, and bellenger in discussing their implementation at curtin university noted that they too did not pursue some of the configurations during their implementation.22 regardless of the implementation schedule, libraries maybe uncertain of how these configurations will be useful until after using the tool. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 9 in our case, it was after implementation that we realized the benefits these configurations could offer staff and users in terms of efficiency. our next steps include • loading course data (we are working on this with our central it department and hope to have it complete for summer 2021), • refining and expanding the advanced automatic rules for copyright approval, • refining digitization workflows, and • implementing the q&a functionality to prompt faculty to describe if they need an entire ebook or just a digitized chapter. these steps will support staff in the administration of courses and reading list in alma and leganto and in improved efficiency in processing citations. furthermore, they will help ensure resources are accessible to all registered students for the appropriate time period. conclusion it is not yet clear if the adoption of the tool has been successful with only 30 lists with student activity for the fall 2020 term and 49 for the winter 2021 term. we had hoped that given the increased need for online learning in this term, instructors would have been eager to use the reading list tool to support student access to resources and learning. conversely, it seems likely that the new tool may have been too much for some instructors during this stressful period.23 sheedy, wells and bellenger noted that there is a potential for system rejection by end users if the change is perceived as creating additional workloads for academic staff, and this may be a factor in our implementation.24 as a result, we will continue to monitor use and determine if our engagement strategies are working. if not, we will need to provide further engagement, training, and support to build interest and use. the public health restrictions due to covid-19 challenged the structure of learning and library services in post-secondary institutions. in some cases, these challenges were opportunities for change. presented with the challenge of how to continue to provide course reserve reading list service, lcr decided to adopt leganto through a rapid implementation. this implementation was an opportunity for lcr to continue to provide reserve reading list service while implementing new workflows to support online access to resources (digitization and purchase requests). although this has met some of lcr’s immediate needs, there is still work that needs to be done to ensure long term successful adoption and use. we will need to continue to review and improve engagement strategies, workflows, and integrations to make the most of this investment. endnotes 1 gary brewerton and jon knight, “from local project to open source: a brief history of the loughborough online reading list system (lorls),” vine 33, no. 4 (2003): 189–95, https://doi.org/10.1108/03055720310510909. 2 richard cross, “implementing a resource list management system in an academic library,” electronic library 33, no. 2 (july 2015): 221, https://doi.org/10.1108/el-05-2013-0088. 3 “leganto course resource list management,” ex libris, march 18, 2021, https://exlibrisgroup.com/products/leganto-reading-list-management-system/. https://doi.org/10.1108/03055720310510909 https://doi.org/10.1108/el-05-2013-0088 https://exlibrisgroup.com/products/leganto-reading-list-management-system/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 10 4 cross, “implementing a resource list management system;” brewerton and knight, “local project to open source.” 5 marie o'neill and lara musto, “faculty perceptions of loughborough's online reading list system (lorls) at dublin business school (dbs),” new review of academic librarianship 23, no. 4 (2016): 368, https://doi.org/10.1080/13614533.2016.1272473; cross, “implementing a resource list management system in an academic library.” 6 cross, “implementing a resource list management system”; brewerton and knight, “local project to open source”; o’neill and musto, “faculty perceptions”; olivia walsby, “implementing a reading list strategy at the university of manchester—determination, collaboration and innovation,” insights the uksg journal 33 (2020), https://doi.org/10.1629/uksg.494. 7 o’neill and musto, “faculty perceptions,” 368. 8 o’neill and musto, “faculty perceptions,” 368. 9 o’neill and musto, “faculty perceptions,” 368; linda sheedy, david wells, and amanda bellenger, “implementation of a leganto reading list service at curtin university library,” in technology, change and the academic library, ed. jeremy atkinson (chandos publishing, 2021), 55– 61, https://doi.org/10.1016/b978-0-12-822807-4.00005-1. 10 o’neill and musto, “faculty perceptions,” 368. 11 walsby, “implementing a reading list strategy.” 12 o’neill and musto, “faculty perceptions,” 368. 13 meredith gorran farkas, “libraries in the learning management system,” in american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 14 jennifer murray and daniel feinberg, “collaboration and integration,” information technology and libraries 39, no. 2 (november 2020), https://doi.org/10.6017/ital.v39i2.11863,. 15 murray and feinberg, “collaboration and integration,” 9. 16 o’neill and musto, “faculty perceptions,” 368. 17 cross, “implementing a resource list management system.” 18 ex libris, “course materials affordability: a win for university of st. thomas,” library journal (november 7, 2018), https://www.libraryjournal.com/?detailstory=course-materialsaffordability-a-win-for-university-of-st-thomas. 19 “canada’s best medical doctoral universities: rankings 2020,” maclean’s, october 3, 2019, https://www.macleans.ca/education/university-rankings-2020-canadas-top-medicaldoctoral-schools/. https://doi.org/10.1080/13614533.2016.1272473 https://doi.org/10.1629/uksg.494 https://doi.org/10.1016/b978-0-12-822807-4.00005-1 https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://doi.org/10.6017/ital.v39i2.11863 https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 11 20 o’neill and musto, “faculty perceptions,” 368. 21 murray and feinberg, “collaboration and integration,” 9. 22 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. 23 “faculty wellness and careers,” course hero (blog), december 1, 2020, https://www.coursehero.com/blog/faculty-wellness-research/. 24 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. https://www.coursehero.com/blog/faculty-wellness-research/ abstract introduction literature review what is leganto? university of calgary context pre-covid-19 reserve reading list service how did covid-19 affect our reserve reading list service? leganto rapid implementation implementation work go-live and post go-live takeaways conclusion endnotes a file storage service on a cloud computing environment for digital libraries victor jesús sosa-sosa and emigdio m. hernandez-ramirez information technology and libraries | december 2012 34 abstract the growing need for digital libraries to manage large amounts of data requires storage infrastructure that libraries can deploy quickly and economically. cloud computing is a new model that allows the provision of information technology (it) resources on demand, lowering management complexity. this paper introduces a file-storage service that is implemented on a private/hybrid cloud-computing environment and is based on open-source software. the authors evaluated performance and resource consumption using several levels of data availability and fault tolerance. this service can be taken as a reference guide for it staff wanting to build a modest cloud storage infrastructure. introduction the information technology (it) revolution has led to the digitization of every kind of information.1 digital libraries are appearing as one more step toward easy access to information spread throughout a variety of media. the digital storage of data facilitates information retrieval, allowing a new wave of services and web applications that take advantage of the huge amount of data available.2 the challenges of preserving and sharing data stored on digital media are significant compared to the print world, in which data “stored” on paper can still be read centuries or millennia later. in contrast, only ten years ago, floppy disks were a major storage medium for digital data, but now the vast majority of computers no longer support this type of device. in today’s environment, selecting a good data repository is important to ensure that data are preserved and accessible. likewise, defining the storage requirements for digital libraries has become a big challenge. in this context, it staff—those responsible for predicting what storage resources will be needed in the medium term—often face the following scenarios: • prediction of storage requirements turn out to be below real needs, resulting in resource deficits. • prediction of storage requirements turn out to be above real needs, resulting in expenditure and administration overhead for resources that end up not being used. in these situations, considering only an efficient strategy to store documents is not enough.3 the acquisition of storage services that implement an elastic concept (i.e., storage capacity that can be victor jesús sosa-sosa (vjsosa@tamps.cinvestav.mx) is professor and researcher at the information technology laboratory at cinvestav, campus tamaulipas, mexico. emigdio m. hernandez-ramirez (emhr1983@gmail.com) is software developer, svam international, ciudad victoria, mexico. information technology and libraries | december 2012 35 increased or reduced on demand, with a cost of acquisition and management relatively low) becomes attractive. cloud computing is a current trend that considers the internet as a platform providing on-demand computing and software as a service to anyone, anywhere, and at any time. digital libraries naturally should be connected to cloud computing to obtain mutual benefits and enhance both perspectives.4 in this model, storage resources are provisioned on demand and are paid according to consumption. services deployment in a cloud-computing environment can be implemented three ways: private, public, or hybrid. in the private option, infrastructure is operated solely for a single organization; most of the time, it requires an initial strong investment because the organization must purchase a large amount of storage resources and pay for the administration costs. the public cloud is the most traditional version of cloud computing. in this model, infrastructure belongs to an external organization where costs are a function of the resources used. these costs include administration. finally, the hybrid model contains a mixture of private and public. a cloud-computing environment is mainly supported by technologies such as virtualization and service-oriented architectures. a cloud environment provides omnipresence and facilitates deployment of file-storage services. it means that users can access their files via the internet from anywhere and without requiring the installation of a special application. the user only needs a web browser. data availability, scalability, elastic service, and pay-per-use are attractive characteristics found in the cloud service model. virtualization plays an important role in cloud computing. with this technology, it is possible to have facilities such as multiple execution environments, sandboxing, server consolidation, use of multiple operating systems, and software migration, among others. besides virtualization technologies, emerging tools that allow the creation of cloud-computing environments also support this type of computing model, providing dynamic instantiation and release of virtual machines and software migration. currently, it is possible to find several examples of public cloud storage, such as amazon s3 (http://aws.amazon.com/en/s3), rackspace (http://www.rackspace.com/cloud/public/files), and google storage (https://developers.google.com/storage), each of which provide high availability, fault tolerance, and services and administration at low cost. for organizations that do not want to use a third-party environment to store their data, private cloud services may offer a better option, although the cost is higher. in this case, a hybrid cloud model could be an affordable solution. organizations or individual users, can store sensitive or frequently used information in the private infrastructure and less sensitive data in the public cloud. the development of a prototype of a file-storage service implemented on a private and hybrid cloud environment using mainly free and open-source software (foss) helped us to analyze the behavior of different replication techniques. we paid special attention to the cost of the system implementation, system efficiency, resource consumption, and different levels of data privacy and availability that can be achieved by each type of system. http://aws.amazon.com/en/s3 http://www.rackspace.com/cloud/public/files https://developers.google.com/storage a file storage service on a cloud computing environment for digital libraries | sosa-sosa 36 infrastructure description the aim of this prototyping project was to design and implement scalable and elastic distributed storage architecture in a cloud-computing environment using free, well-known, open-source tools. this architecture represents a feasible option that digital libraries can adopt to solve financial and technical challenges when building a cloud-computing environment. the architecture combines private and public clouds by creating a hybrid cloud environment. for this purpose, we evaluated tools such as kvm and xen, which are useful for creating virtual machines (vm).5 open nebula (http://opennebula.org), eucalyptus (http://www.eucalyptus.com), and openstack (http://www.openstack.org) are good, free options for managing a cloud environment. we selected open nebula for this prototype. commodity hard drives have a relatively high failure rate, hence our main motivation to evaluate different replication mechanisms, providing several levels of data availability and fault tolerance. figure 1(a) shows the core components of our storage architecture (the private cloud), and figure 1(b) shows a distributed storage web application named distributed storage on the cloud (disoc), used as a proof of concept. the private cloud also has an interface to access a public cloud, thus creating a hybrid environment. figure 1. main components of the cloud storage architecture the core components and modules of the architecture are the following: • virtual machine (vm). we evaluated different open-source were evaluated, such as kvm and xen, for the creation of virtual machines.6 some performance tests were done, and kvm showed a slightly higher performance than xen. we selected kvm as the main virtual machine manager (vmm) for the proposed architecture. vmms also are called http://opennebula.org/ http://www.eucalyptus.com/ http://www.openstack.org/ information technology and libraries | december 2012 37 hypervisors. each vm has a linux operating system that is optimized to work in virtual environments and requires a minimum consumption of disk space. the vm also includes an apache web server, a php module, and some basic tools that were used to build the disoc web application. every vm is able to transparently access a pool of disks through a special data access module, which we called dam. more details about dam follow. • virtual machine manager module (vmmm). this has the function of dynamic instantiation and de-instantiation of virtual machines depending on the current load on the infrastructure. • data access module (dam). all of the virtual disk space required by every vm was obtained through the data access module interface (dam-i). dam-i allows vms to access disk space by calling dam, which provides transparent access to the different disks that are part of the storage infrastructure. dam allocates and retrieves files stored throughout multiple file servers. • load balancer module (lbm). this distributes the load among different vms instantiated on the physical servers that make up the private cloud. • load manager (lm). this monitors the load that can occur in the private cloud. • distributed storage on the cloud (disoc). this is a web-based file-storage system that is used as a proof of concept and was implemented based on the proposed architecture. replication techniques high availability is one of the important features offered in a storage service deployed in the cloud. the use of replication techniques has been the most useful proposal to achieve this feature. dam is the component that provides different levels of data availability. it currently includes the following replication policies: no-replication, total-replication, mirroring, and ida-based replication. • no-replication. this replication policy represents the data availability method with the lowest level of fault tolerance. in this method, only the original version of a file is stored in the disk pool. it follows a round-robin allocation policy whereby load assignation is made based on a circularly linked list, taking into account disk availability. this policy prevents all files from being allocated to the same server, providing a minimal fault tolerance in case a server failure. • mirroring. this replication technique is a simple way to ensure higher availability without high resource consumption. in this replication, every time a file is stored in a disk, the dam creates a copy and places it on a different disk. • total-replication. this represents the highest data availability approach. in this technique, a copy of the file is stored on all of the file servers available. total-replication also requires the highest consumption of resources. • ida-based replication. to provide higher data availability with less impact on the consumption of resources, an alternative approach based on information-dispersal techniques can be used. the information dispersal algorithm (ida) is an example of this a file storage service on a cloud computing environment for digital libraries | sosa-sosa 38 strategy.7 when a file (of size |f|) is required to be stored using the ida, the file is partitioned into n fragments of size |f|/m, where mnthl y i weekly ondb weekly and monthly internet users correlated with those who use online and cd-rom databases. it might be that they search the internet via search engines as supplements to conventional online sources. alternatively they may search using search engines on an exploratory basis when they begin a relatively new subject. daily users who correlated with search engines might have mistaken the highway function of search spearman's rho daily correlation coeffic ient 1.000 -.544 sig. (2-tailed) .456 n 4 4 seng correlation coefficient -.544 1.000 sig. (2-tailed) .456 n 4 4 m:>nthl y correlation coefficient -258 0.316 sig. (2-tailed) .742 .684 n 4 4 weekly correlation coefficient -.544 .500 sig . (2-tailed) .456 .500 n 4 4 ondb correlation coefficient .258 .316 sig . (2-tailed) .742 .684 n 4 4 table 1. nonparametric correlations-spearman's rho that, at best, search engines seem to reach just about half of the web pages available on the internet.13 sullivan has given several reasons why search engine coverage is incomplete and search results sometimes may be misleading.14 among the most cogent reasons are: documents may be changed after they have been picked up for inclusion; deleted materials may be displayed as available; and web sites or files which are password accessible are not covered. much of the information needed in academe is proprietary and available via database vendors. using search engines as the main recourse to topical information shortchanges the user and may lead to frustration unless the high user expectations are tempered by constant education by the information specialist. correlations .258 .742 4 .316 .684 4 1 4 .949 .051 4 .800 .200 4 -.544 .45€ 4 .50< .50< ' 0.94! .051 ' 1.00( .63 .36! ' .258 .742 4 .316 .684 4 0.8 .200 4 .632 .368 4 1.000 4 engines from the actual sources for example: edgar or medline or eric. it might have been the problem of confusing "the end" with the "means to the end ." i implications for information professionals our studies indicated that a majority of the users in the sample preferred the search engines as access points to the internet for topical information. the interest in search engines correlated with the state university of new york at albany study which also indicated their predominant use in searching the internet. 15 while the albany study was general, ours related the search engines to getting topical information and the use of online databases as an alternative. our findings point to the need to re-educate the internet user in several aspects of the superhighdaily i seng i m:>nthl yi weekly ondb way. first, content-the fact that only a fraction of the possible sites (approximately one half) are indexed by the search engines. second, authority-because it is so easy to self-publish on the internet, a lot of information of low integrity (for instance) or factual inaccuracy may be mistaken for reliable sources. third, transiency of information found on the internet must be pointed out. the maxim "here today, gone tomorrow" is appropriate for several kendall 's tau_ b daily correlation coefficient 1.000 -.516 sig . (2-tailed) .346 n 4 4 seng correlation coefficient .516 1.000 sig. (2-tailed) .346 n 4 4 m:>nthl y correlation coefficient -236 .000 .183 sig . (2-tailed) .655 .718 n 4 4 weekly correlation coefficient -516 .000 .400 sig . (2-tailed) .346 .444 n 4 4 ondb correlation coeffic ient .236 .183 sig. (2-tailed) .655 .718 n 4 4 table 2. nonparametric correlations-kendall's tau-b .236 .516 .655 .346 4 ' .183 .40< .718 .44< 4 ' 1.000 .91, .071 4 ' .913 1.00c .071 4 ' .667 .54l .174 .27! 4 ' .236 .655 4 .183 .718 4 .667 .174 4 .548 .279 4 1.000 4 web sites on the internet. finally, information professionals must the internet as a source of academic resource information i kibirigie and depalo 15 reproduced with permission of the copyright owner. further reproduction prohibited without permission. emphasize in their training the proven online databases to which users should go directly, if and when those databases are provided by the library or information center. information professionals have a direct link to providing users with guidance to proven online databases, specifically during course-integrated instruction. education for the end user is paramount to the optimum utilization of electronic information sources. a welldeveloped information resources instruction program is needed in conjunction with the one-on-one instruction that takes place every day at the reference/information desk. such instruction programs must be cumulative, if they are to be effective in an age of burgeoning choices for end users who can more and more often choose to be remote users of information resources. in an academic environment, early intervention at the freshman level is paramount, but also must be pursued in a structured manner at the upper levels. many college and university information resources instruction programs are based on a one-shot, approximately fifty minute session, which often is executed as an orientation to the library /information center. such a method of instruction has no guarantee that there will be further guidance sought, either at the behest of a teaching faculty member in the form of course-integrated instruction, or on an individual level at the reference desk. developing effective ways to integrate information resources instruction into the lives of end users is one of the challenges information professionals face in the new millennium with an increase in the use of electronic resources found on the internet. references and notes 1. jon guice, "looking backward and forward at the internet," the information society 14, no. 3 (july /sept. 1998): 201-11. 2. g. mcmurdo, "the net by numbers," journal of information science 22, no. 5 (1996): 1397-411. 3. n. l. pelzer and others, "library use and information seeking behavior of veterinary medical students revisited in the electronic environment," bulletin of the medical library association 86, no. 3 (july 1998): 346-55. 4. harry m. kibirige, "viewdata," in encyclopedia of electrical and electronics engineering, vol. 23, ed. by g. webster (new york: john wiley, 1999), 223-31. 5. department of labor, the secretary's commission on achieving necessary skills, skills and tasks for jobs (washington , d.c.: department of labor, 1992). 6. gloria l. leckie , "desperately seeking citations: uncovering faculty assumptions about the undergraduate search process," journal of academic librarianship 22, no. 3 (1996): 202-208. 7. joseph d. atkinson and miguel figueroa, "information seeking behavior of business students : a research study," the reference librarian 58, (1997): 59-73. 8. deborah shaw, "bibliographic database searching by graduate students in language and literature: search strategies, systems interfaces, and relevance judgements," library & information science research 17, no. 4 (fall 1995): 327-45 . 9. richard l. hart, "information gathering among the faculty of a comprehensive college : formality and globality," journal of academic librarianship 23, no . 1 (jan. 1997): 21-27. 10. k. l. curtis and others, "information-seeking behavior of health science faculty: the impact of new information technologies," bulletin of the medical library association 85, no . 4 (oct. 1997): 402-10. 11. candy schwartz, "web search engines," journal of the american society for information science 49, no. 11 (sept. 1998) 973-82. 12. amelia kassel, "internet power searching : finding pearls in a zillion grains of sand," information outlook (apr . 1999): 28-32. 13. ibid. 14. danny sullivan , "search engine coverage study published," search engine watch. accessed march 11, 2000, www .searchenginewatch.com. / sereport/99 /os-size.html. 15. wei peter he, "what are they doing on the internet?: study of information seeking behaviors," internet reference services quarterly 1, no. 1 (1996): 31-51 . 16 information technology and libraries i march 2000 the provision of mobile services in us urban libraries ya jun guo, yan quan liu, and arlene bielefield information technology and libraries | june 2018 78 ya jun guo (yadon0619@hotmail.com) is associate professor of information and library science at zhengzhou university of aeronautics, china. yan quan liu (liuy1@southernct.edu) is professor of information and library science at southern connecticut state university. arlene bielefield (bielefielda1@southernct.edu) is professor in information and library science at southern connecticut state university. . abstract to determine the present situation regarding services provided to mobile users in us urban libraries, the authors surveyed 138 urban libraries council members utilizing a combination of mobile visits, content analysis, and librarian interviews. the results show that nearly 95% of these libraries have at least one mobile website, mobile catalog, or mobile app. the libraries actively applied new approaches to meet each local community’s remote-access needs via new technologies, including app download links, mobile reference services, scan isbn, location navigation, and mobile printing. mobile services that libraries provide today are timely, convenient, and universally applicable. introduction the mobile internet has had a major impact on people’s lives and on how information is found located and accessed. today, library patrons are untethered from and free of the limitations of the desktop computer.1 the popularity of mobile devices has changed the relationship between libraries and patrons. mobile technology allows libraries to have the kind of connectivity with their patrons that did not exist previously. patrons no longer think that it is necessary for them to be physically in the library building to use library services, and they are eager to obtain 24/7 access to library resources anywhere using their mobile devices. mobile patrons need mobile libraries to provide them with services. in other words, “patrons want to have a library in their pocket.”2 as a result, libraries around the world are exploring and developing mobile services. according to the state of america’s libraries 2017 report by the american library association, the 50 us states, the district of columbia, and outlying territories have 8,895 public library administrative units (as well as 7,641 branches and bookmobiles). the vital role public libraries play in their communities has also expanded.3 as part of the main role of public libraries, us urban libraries need to embrace the developmental trend of the mobile internet to better serve their communities. the provision of mobile services in us urban libraries is worthy of study and is of great significance as a model for how other public libraries plan and implement their mobile services. mailto:yadon0619@hotmail.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu the provision of mobile services in us urban libraries | guo, liu, and bielefield 79 https://doi.org/10.6017/ital.v37i2.10170 literature review definition and types of mobile devices and mobile services as early as 1991, mark weiser proposed “ubiquitous computing,” pointing out how people could obtain and handle information at anytime, anywhere, and in any way.4 with this expectation, the possibilities of using personal digital assistants (pdas) as mobile web browsers were researched in 1995.5 in combination with a wireless modem, library users are able to use pdas to access information services whenever they are needed. today, mobile devices are generally defined as units small enough to carry around in a pocket, falling into the categories of pdas, mobile phones, and personal media players.6 for many researchers, laptops are not included in the definition of mobile devices. although wireless laptops purportedly offer the opportunity to go “anywhere in the home,” laptops are generally used in a small set of locations, rather than moving fluidly through the home; wireless laptops are portable, but not mobile.7 in contrast, lippincott suggested that mobile devices should include laptops, netbooks, notebook computers, cell phones, audio players such as mp3 players, cameras, and other items.8 according to the “mobile strategy report” by the california digital library, mobile phones, e-readers, mp3 players, tablets, gaming devices, and pdas are common mobile devices.9 each mobile device has its own characteristics and the potential to connect to the internet from anywhere with a wi-fi network, driving widespread use and thus the provision of library mobile services. mobile services are services libraries offer to patrons via their mobile devices. these services as described herein comprise two categories: traditional library services modified to be available via mobile devices and services created for mobile devices.10 pope et al. listed several mobile services, including sms or text-messaging services, the my info quest project, digital collections, audiobooks, applications, and mobile-friendly websites.11 the california digital library pointed out that a growing number of university and public libraries are offering mobile services. libraries are creating mobile versions of library websites, using text messaging to communicate with patrons, developing mobile catalog searching, providing access to resources, and creating new tools and services, particularly for mobile devices.12 the most recognized mobile services in university libraries are mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e books.13 both academic and public libraries’ use of web 2.0 applications and services include blogs, wikis, phone apps, qr codes, mash-ups, video or audio sharing, customized webpages, social media and social networking, and types of social tagging.14 this study focuses on the two most common mobile devices, mobile phones and tablets, and on the services provided to library patrons and local communities through mobile websites, mobile apps, and mobile catalogs. status of mobile services in us libraries mobile devices present a new and exciting opportunity for libraries of all types to provide information to people of all ages on the go, wherever they are.15 it is generally observed that there is an increased use of mobile technology in the library environment. information technology and libraries | june 2018 80 librarians see their users increasingly using mobile phones instead of laptops and desktop computers to search the catalog, check the library’s opening hours, and maintain contact with library staff.16 in an earlier investigation of 766 librarians, spires found that there was very little demand for services for mobile devices as of august 2007. at that time, relatively few libraries (18%) purchased content specifically for wireless handheld device use, and very few libraries (15%) reformatted content for these devices.17 however, a survey of public libraries completed by the american library association between september and november 2011 indicated interesting changes: 15% of library websites are optimized for mobile devices, and 12% of libraries use scanned codes (e.g. qr codes), and 7% of libraries have developed smartphone applications for access to library services; 36% of urban libraries have websites optimized for mobile devices, compared to 9% of rural libraries; 76% of libraries offer access to e-books; 70% of libraries use social networking tools such as facebook. 18 later studies revealed more significant changes. 99 association of research libraries member libraries were surveyed in 2012 to identify how many had optimized at least some services for the mobile web. apps were not investigated. the result showed that 83 libraries (84%) had a mobile website.19 a study in 2015 by liu and briggs showed that the top 100 university libraries in the united states offered one or more mobile services, with mobile websites, mobile access to the library catalog, mobile access to the library’s databases, e-books, and text messaging services being the most common. qr codes and augmented reality were less common.20 kim noted that “libraries are acknowledging that people expect to do just about everything on mobile devices and that more and more people are now using a mobile device as their primary access point for the web.”21 although librarians may have previously underestimated what people wanted to do using mobile devices, there is a growing understanding of the potential of these access points. research design survey samples while a growing number of users tend to access information remotely, urban libraries, as the most popular public-sector institutions and community centers, are facing great challenges in addressing the growing need for mobile services. the urban libraries council (ulc) (https://www.urbanlibraries.org), as an authoritative source founded in 1971, is the premier membership association of north america’s leading public library systems. ulc’s member libraries are in communities throughout the united states and canada, comprising a mix of institutions with varying revenue sources and governance structures, and serving communities with populations of differing sizes. ulc’s website lists 145 us and canadian urban libraries. since this study focused only on us urban libraries, 138 libraries were chosen as the study targets, and all were examined. https://www.urbanlibraries.org/ the provision of mobile services in us urban libraries | guo, liu, and bielefield 81 https://doi.org/10.6017/ital.v37i2.10170 table 1. the survey and examples of survey results. contents options example no.1: pima county public library … example no.138: milwaukee public library components of mobile websites 1 account login; 2 catalog search; 3 contact us; 4 downloadables; 5 events; 6 interlibrary loan; 7 kids & teens; 8 locations and hours; 9 meeting room; 10 recent arrivals; 11 recommendations; 12 social media; 13 suggest a purchase; 14 support 1, 2, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14. 1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14. components of mobile apps 1 account login; 2 barcode wallet; 3 bestsellers; 4 catalog search; 5 contact us; 6 downloadables; 7 events; 8 full website; 9 interlibrary loan; 10 just ordered; 11 kids & teens; 12 locations and hours; 13 meeting room; 14 my bookshelf; 15 my library; 16 pay fines; 17 popular this week; 18 recent arrivals; 19 recommendations; 20 scan isbn; 21 social media; 22 suggest a purchase; 21 support 1, 4, 5, 6, 7, 8, 12, 15, 18, 20, 21. 1, 4, 5, 6, 7, 8, 12, 17, 20, 21. mobile reference services 1 chat/im; 2 social medias; 3 text/sms; 4 web form - 1, 3, 4. social media 1 blog; 2 facebook; 3 flickr; 4 goodreads; 5 google+; 6 instagram; 7 linkedin; 8 pinterest; 9 tumblr; 10 twitter; 11 youtube 1, 2, 3, 6, 8, 10, 11. 1, 2, 6, 8, 10. mobile reservation services 1 reserve a computer; 2 reserve a librarian; 3 reserve a meeting room; 4 reserve a museum pass; 5 reserve a study room; 6 reserve exhibit space - 3. mobile printing 1 mobile printing; 2 no mobile/ wi-fi printing; 3 wifi printing 3. 2. apps or databases 1 axis 360; 2 biblioboard; 3 bookflix;4 brainfuse; 5 career transitions; 6 cloud library; 7 driving -tests.org; 8 ebscohost; 9 flipster; 10 freading; 11 freegal; 12 gale virtual; 13 hoopla; 14 instant flix; 15 learning express; 16 lynda.com; 17 mango languages; 18 master file; 19 morningstar; 20 new york times; 21 novelist; 22 one click digital; 23 overdrive; 24 reference usa; 25 safari; 26 tumble book; 27 tutor.com; 28 world book; 29 worldcat; 30 zinio. 4, 11, 14, 22, 23, 26, 28, 30. 4, 8, 11,12, 13, 15, 17, 18, 19, 21, 23, 24, 30. information technology and libraries | june 2018 82 survey methods as mobile services are offered basically via wireless systems and mobile devices, a combination of research methods, including mobile website visits, content analysis, and librarian interviews, were applied for data collection. specifically, librarian interviews were employed as a verification and supplemental process to ensure that survey data were accurate and exhaustive. first, the authors utilized an iphone, an android mobile phone, and an ipad to access the websites of the 138 us urban libraries in the study sample to ascertain if these libraries have mobile websites or mobile catalogs and whether the platforms are operated properly. then the authors checked whether these libraries have mobile apps that can be downloaded from the apple app store or the google play store. the survey was conducted from june 18 to june 24, 2017. next, the authors went through all the mobile websites and the mobile apps the libraries provide to check the mobile services offered. the authors used a specially designed survey to collect data about each library’s mobile website and app (see table 1). the procedure of survey content analysis was conducted between june 25 and july 24, 2017, with the examination of each library’s services taking approximately 30 minutes. finally, for those libraries that had no mobile websites or mobile apps found through the website visits, the authors made interview requests to staff librarians via their online reference services such as live chat, web form and email. an additional purpose of this step was to confirm the accuracy of the survey data collected from website visits. the survey was conducted from july 22 to august 3, 2017. results and analysis results from the examination of mobile website visits, content analysis, and librarian interviews revealed what services us urban libraries provided as mobile services, how they were provided, and which were commonly provided. how many libraries provide mobile services? over 83% of us urban libraries have developed their own mobile websites (see figure 1) for communities they serve. the mobile website is currently the most popular service platform for mobile users. the provision of mobile services in us urban libraries | guo, liu, and bielefield 83 https://doi.org/10.6017/ital.v37i2.10170 figure 1. types of mobile services provided by libraries. promisingly, each test of these websites through the authors’ mobile devices, either smartphones or tablets, confirmed that all the study subjects can be accessed 100% of the time. these library websites, however, are not entirely built specially for mobile devices. while the majority of urban libraries have transformed their desktop websites into mobile sites with proper responsive design, about 17% are just smaller versions of their desktop websites (see figure 2). a responsive mobile website can react or change according to the needs of the users and the mobile device they’re viewing it on to achieve a good layout and content display. here, text and images change from a three-column to a single-column layout, and unnecessary images are hidden. the web address of a responsively designed mobile website is the same as the desktop website. responsive design is described as a long-term solution for addressing both designers’ and users’ needs.22 the survey found that 59% of libraries now have apps. our analysis of the earliest version of apps records indicate that los angeles public library was the first to use an app, in august 2010. mobile apps have advantages and disadvantages compared to mobile websites, and many libraries compared them and chose between the two. skokie (illinois) public library, as of october 2015, is no longer supporting the library’s mobile app because they claim the library’s website offers a better mobile experience. they also offer an easy access solution like that for a mobile app, with a message displayed to users: “miss having an icon on your home screen? bookmark the site to your home screen and you’ll have an icon to take you directly to this site.” 83% 59% 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% mobile website mobile app mobile catalog information technology and libraries | june 2018 84 figure 2. the smaller versions of the desktop website and the specially designed mobile website the proportion of libraries providing mobile catalog services is only 22%. libraries can use multiple options to create one or more mobile service platforms. nearly half (46%) of us urban libraries have both mobile websites and mobile apps. according to the survey, 95% of libraries have at least one mobile website, mobile catalog, or mobile app. a survey the authors conducted in april 2014 found that only 81% of the urban libraries had at least one mobile website, mobile catalog, or mobile app (see figure 3). clearly, libraries are paying increasing attention to mobile services, and providing mobile services has become the unavoidable choice of libraries nowadays. figure 3. changes in the proportion of libraries that provide mobile services from 2014 to 2017. 19% 81% 2014 no mobile services at least one mobile service 5% 95% 2017 no mobile services at least one mobile service the provision of mobile services in us urban libraries | guo, liu, and bielefield 85 https://doi.org/10.6017/ital.v37i2.10170 what content do the mobile websites offer? through mobile website visits and content analysis, it was found that some types of information are available at all libraries, including “account login,” “events,” “locations and hours,” “contact us,” and “social media” (see figure 4). figure 4. components of mobile websites the proportion of library mobile sites that offer “support” and “downloadables” is 96% and 95%, respectively. among them, “support” generally includes donations to the library foundation, donation of books and other materials, and providing volunteer services; “downloadables” generally include e-books, e-magazines, and music. a total of 86% of the urban libraries set up “kids” and “teens” sections, providing specialized information services, such as storytime, games, events, book lists, homework help, volunteer information, and college information. a majority (62%) of libraries provide interlibrary loan information on mobile websites, but one library, palo alto (california) city library, no longer offers the costly interlibrary loan service as of july 2011. more than half (56%) of the libraries set up a “suggest a purchase” function and generally ask readers to provide title, author, publisher, year published, format, and other information in web form. some libraries display “recommendations” (26%) on their mobile websites. denver public library has a special column recommending books for children and teenagers and offers personalized reading suggestions: “tell us what you like to read and we’ll send you our recommendations in about a week.” many mobile websites will pop hints to the libraries’ mobile apps and link to the apple app store or the google play store after automatically identifying the user’s mobile phone operating system. this is helpful for promoting the use of the libraries’ apps, and it also provides great convenience for users. 100% 100% 100% 100% 100% 99% 96% 95% 86% 74% 62% 56% 32% 26% 0% 20% 40% 60% 80% 100% account login events locations and hours contact us social media catalog search support downloadables kids & teens meeting room interlibrary loan suggest a purchase recent arrivals recommendations http://www.marinlibrary.org/events/?trumbaembed=filter3%3dstorytimes information technology and libraries | june 2018 86 what content do the mobile apps offer? the content of mobile websites in libraries is basically the same, but the content of their mobile apps varies widely. the reason is that the understanding of the various libraries about the functions an app should offer differs from one library to another. some of these apps were designed by software vendors, such as boopsie, sirsidynix, and bibliocommoms, but some were designed by the libraries themselves, leading to the absence of a uniform standard or template for the app design. survey results show that only “account login” and “catalog search” are available in all apps (see figure 5). “locations and hours” accounts for a high proportion of apps at 96%. the “locations” feature in many libraries apps, with the help of gps, helps users find their nearest library location. figure 5. components of mobile apps about 85% of apps provide “contact us.” click “contact us” in poudre river public library district and some other libraries’ apps, and you can directly call the library or send text messages via email. “scan isbn” is a unique feature of mobile apps, and 75% of apps provide this functionality. if a library user finds a book they need in a bookstore or elsewhere, they can scan the isbn to can see if that book is in the library’s collection. apps designed by bibliocommoms all have “bestsellers”, “recently reviewed”, “just ordered” and “my library” (see chart figure 6). in “my library,” the “checked out” section contains red alerts for “overdue,” yellow alerts for “due soon,” and “total items.” the “holds” section contains “ready for pickup,” “active holds,” and “paused holds.”. the “my shelves” section contains “completed,” “in progress,” and “for later.” in this way, users can clearly see the details of the books they have 100% 100% 96% 89% 85% 77% 75% 68% 46% 27% 24% 19% 18% 16% 16% 10% 6% 5% 3% 0% 20% 40% 60% 80% 100% account login catalog search locations and hours downloadables contact us events scan isbn social media full website recent arrivals bestsellers recently reviewed popular this week just ordered my library my bookshelf pay fines barcode wallet kids & teens the provision of mobile services in us urban libraries | guo, liu, and bielefield 87 https://doi.org/10.6017/ital.v37i2.10170 borrowed and intend to borrow. apps designed by boopsie generally have “popular this week” to tell users which books have been borrowed more recently. figure 6. an app designed by bibliocommoms. only 3% of apps have “kids” and “teens” sections, which differs greatly from the percentage of mobile websites that offer those sections (86%). what mobile reference services do libraries provide? according to the survey, the most common way for us urban libraries to provide mobile reference service is a web form, which is available in 86% of surveyed libraries (see figure 7). related to “call us,” a web form has the advantage of being independent from the library’s working hours. although users fill out and submit a web form, it is similar to email and, generally, librarians respond to the user’s e-mail address, but it does not require users to enter their own email system, as they only need to fill in the content required by the web form. therefore, it is more convenient to use. the authors believe that providing only an email address is not mobile reference service. the survey found that 6% of libraries do not have mobile reference services. information technology and libraries | june 2018 88 figure 7. mobile reference services provided by libraries. currently, 43% of libraries offer chat and instant messaging (im) services, which allow users to communicate with librarians instantly. for example, when gwinnett county (georgia) public library’s mobile website is visited, an “ask us” dialog box appears in the upper right corner of the site, which allows visitors to chat with librarians. outside of the library’s work hours, the box displays “sorry, chat is offline but you can still get help” (see figure 8). the county of los angeles public library provides four options for im. they are aim, google talk, yahoo! messenger, and msn messenger. figure 8. “ask us” on gwinnett county public library’s mobile website 86% 43% 33% 8% 0% 20% 40% 60% 80% 100% web form chat/im text/sms social media the provision of mobile services in us urban libraries | guo, liu, and bielefield 89 https://doi.org/10.6017/ital.v37i2.10170 all the florida urban libraries surveyed offer reference services via the web form, chat, and text because an “ask a librarian” service administered by the tampa bay library consortium provides florida residents with those mobile reference services. the survey shows that only 8% of the libraries provide social media reference service in “ask a librarian.” the social media that provides reference service is either facebook or twitter. in fact, 100% of libraries have social media, and 100% of libraries have facebook and twitter, but most libraries do not use them to provide reference services. what social media do the libraries use? survey results showed that 100% of mobile websites display links to their social media, usually in the prominent position of the front page of the websites; 68% of apps have social media links. facebook and twitter are social media leaders, and now all libraries’ mobile websites have both (see figure 9). the survey conducted in 2014 showed that facebook and twitter had the highest occupancy rate, but only 61% of libraries offered facebook and 53% offered twitter. it is obvious that libraries have made great progress in the last three years in the application of social media. figure 9. social media being used by libraries. instagram and pinterest are both photo social media, and they are used 76% and 49%, respectively. as the leading social media in the video field, youtube is used by 67% of libraries. what mobile reservation services do libraries provide? mobile reservation services were found in 78% of all libraries’ mobile services. a majority (62%) of the libraries allow online reservation of a meeting room via web form or other forms, and 14% allow reserving a study room (see figure 10). some libraries only reserve a study or meeting room via phone. 100% 100% 76% 67% 57% 49% 41% 19% 12% 12% 9% 0% 20% 40% 60% 80% 100% facebook twitter instagram youtube blog pinterest flickr tumblr linkedin google+ goodreads information technology and libraries | june 2018 90 figure 10. mobile reservation services provided by libraries. a few libraries provide instant online access to free and low-cost tickets to museums, science centers, zoos, theatres, and other fun local cultural venues with discover & go. a total of 14% of the libraries provide “reserve a librarian” service, allowing patrons to reserve a free session with a reference librarian or subject specialist at the library. in addition, several libraries, such as pasadena public library, allow reserving of exhibit space. how many libraries provide mobile printing? mobile printing services allow patrons to print to a library printer from outside the library or from their mobile device. patrons’ print jobs are available for pick up at the library. already, 43% of the libraries provide mobile printing service (see figure 11). it is expected that more libraries will provide this service. to print from a mobile device, patrons need to download an app that supports mobile printing. printeron is the more commonly used app, which has been used by oakland public library, and san mateo county (california) libraries, and others. however, san diego public library uses the your print cloud print system, and santa clara county (california) library uses smart alec. san mateo county libraries offers wireless printing from smartphones, tablets, and laptops at all of its locations, and its wireless printing includes mobile printing, web printing, and email printing. in addition, 14% of libraries offer wireless printing services but do not provide mobile printing services. for example, live oak public libraries in savannah, georgia, states that printing from laptops (pc and mac) is available in all branches, but they don’t have apps that support printing from tablets or mobile phones. 62% 20% 15% 14% 14% 4% 0% 10% 20% 30% 40% 50% 60% 70% reserve a meeting room reserve a computer reserve a museum pass reserve a study room reserve a librarian reserve exhibit space the provision of mobile services in us urban libraries | guo, liu, and bielefield 91 https://doi.org/10.6017/ital.v37i2.10170 figure 11. the proportion of libraries that offer mobile printing. what apps or databases do libraries provide for patrons? four main software programs found to be used to display e-books of the surveyed libraries are overdrive (93%), hoopla (64%), tumblebook (61%), and cloud library (48%). for audiobooks, overdrive (93%) and hoopla (64%) are the most popular; oneclickdigital is used by 48%. most libraries (74%) use zinio for e-magazines, and 48% use the music software freegal. overdrive is the most common application in libraries (see table 2). table 2. the proportion of apps or databases being used in libraries. apps or databases % of libraries providing apps or databases % of libraries providing overdrive 93 world book 46 novelist 79 new york times 44 referenceusa 74 masterfile 43 zinio 74 ebscohost 43 learningexpress 69 flipster 29 gale virtual 68 bookflix 28 hoopla 64 brainfuse 22 morningstar 64 tutor.com 17 mango languages 61 safari 17 tumblebook 61 driving-tests.org 16 lynda.com 57 biblioboard 12 worldcat 51 career transitions 12 freegal 48 axis 360 11 oneclick digital 48 instantflix 10 cloud library 48 freading 9 mobile printing 43% no wireless/mobile printing 42% wireless printing 14% information technology and libraries | june 2018 92 the libraries provide users with various types of databases. survey statistics show that the widely used databases include referenceusa (business), mango languages (language learning), learningexpress and career transitions (job and career), lynda.com and tutor.com (education), morningstar (investment), world book (encyclopedias), worldcat (library resources worldwide), new york times (newspaper articles), driving-tests.org (testing preparation), and safari (technology). conclusion this study shows that mobile services have become popular in us urban libraries as of summer 2017, with 95% offering one or more types of mobile service. responsive mobile websites and mobile apps are the main platforms of current mobile services. the us urban libraries are terribly striving to meet local community’s remote access needs via new technologies. compared with desktop websites, mobile websites and apps for mobile devices offer services that are more accessible, smarter and interactive for local users. some mobile websites automatically prompt the user to install the libraries’ apps; many libraries’ apps offer the “scan isbn” function, making it convenient for the user to scan a book title at any time to see if it is in the library’s collection; “location” provides gps positioning and navigation services for users; “contact us” can directly link telephone, text, and email. libraries are actively developing and adding more mobile services, such as mobile reservation services and mobile printing services. the development of mobile technology has provided the support for libraries to offer mobile services. a future world of users accessing services provided by the libraries at anytime, anywhere, and in any way is getting closer and closer. acknowledgements this work was supported by grant no. 14ctq028 from the national social science foundation of china. references 1jason griffey, mobile technology and libraries (new york: neal-schuman, 2010). 2meredith farkas, “a library in your pocket,” american libraries no. 41 (2010): 38. 3american library association, “the state of america’s libraries 2017: a report from the american library association,” special report, american libraries, april 2017, http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-librariesreport-2017.pdf. 4mark weiser, “the computer for the 21st century,” scientific american 265, no. 3 (1991): 94–104. 5stefan gessler and andreas kotulla, “pdas as mobile www browsers,” computer networks and isdn systems 28, no. 1–2 (1995): 53–59. 6georgina parsons, “information provision for he distance learners using mobile devices,” electronic library 28, no. 2 (2010): 231–44, https://doi.org/10.1108/02640471011033594. http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf https://doi.org/10.1108/02640471011033594 the provision of mobile services in us urban libraries | guo, liu, and bielefield 93 https://doi.org/10.6017/ital.v37i2.10170 7allison woodruff et al., “portable, but not mobile: a study of wireless laptops in the home,” international conference on pervasive computing 4480 (2007): 216–33, https://doi.org/10.1007/978-3-540-72037-9_13. 8joan k. lippincott, “a mobile future for academic libraries,” reference services review 38, no. 2 (2010): 205–13. 9rachel hu and alison meir, “mobile strategy report,” california digital library, august 18, 2010, https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+r esearch_final.pdf?version=1. 10yan quan liu and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology & libraries 34, no. 2 (2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 11kitty pope et al., “twenty-first century library must-haves: mobile library services,” searcher 18, no. 3 (2010): 44–47. 12hu and meir, “mobile strategy report.” 13qian and briggs, “a library in the palm of your hand.” 14kalah rogers, “academic and public libraries’ use of web 2.0 applications and services in mississippi,” slis connecting 4, no. 1 (2015), https://doi.org/10.18785/slis.0401.08. 15 pope et al., “twenty-first century library must-haves.” 16lorraine paterson and low boon, “usability inspection of digital libraries: a case study,” ariadne 63, no. 1 (2010): 11, https://doi.org/10.1007/s00799-003-0074-4. [website lists h. rex hartson, priya shivakumar, and manuel a. pérez-quiñones as the authors] 17todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287–309, https://doi.org/10.1080/10875300802326327. 18 american library association, “libraries connect communities 2011-2012,” last modified june, 2012, http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf. 19barry trott and rebecca jackson, “mobile academic libraries,” reference & user services quarterly 52, no. 3 (2013): 174–78. 20 liu and briggs, “a library in the palm of your hand.” 21bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28. 22hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobileoptimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. https://doi.org/10.1007/978-3-540-72037-9_13 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.18785/slis.0401.08 https://doi.org/10.1007/s00799-003-0074-4 https://doi.org/10.1080/10875300802326327 http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf https://doi.org/10.6017/ital.v32i4.4636 abstract introduction literature review definition and types of mobile devices and mobile services status of mobile services in us libraries research design survey samples survey methods results and analysis how many libraries provide mobile services? what content do the mobile websites offer? what content do the mobile apps offer? what mobile reference services do libraries provide? what social media do the libraries use? what mobile reservation services do libraries provide? how many libraries provide mobile printing? what apps or databases do libraries provide for patrons? conclusion acknowledgements references editor’s comments bob gerrity information technology and libraries | march 2012 4 welcome to the first issue of information technology and libraries (ital) as an open-access, eonly publication. as announced to lita members in early january, this change in publishing model will help ensure the long-term viability of ital by making it more accessible, more current, more relevant, and more environmentally friendly. ital will continue to feature high-quality articles that have undergone a rigorous peer-review process, but it will also begin expanding content to include more case studies, commentary, and information about topics and trends of interest to the lita community and beyond. look for a new scope statement for ital shortly. we’re pleased to include in this issue the winning paper from the 2011 lita/ex libris student writing award contest, abigail mcdermott’s overview on copyright law. we also have two lengthier-than-usual studies on library discovery services. the first, jason vaughan’s overview of his library’s investigations into web-scale discovery options, was accepted for publication more than a year ago, but due to its length did not make it into “print” until now, since we no longer face the constraints associated with the production of a print journal. the second study, by jody condit fagan and colleagues at james madison university, focuses on discovery-tool usability. jimmy ghaphery and erin white provide a timely overview of the results of their surveys on the use and management of web-based research guides. tomasz neugebauer and bin han offer a strategy and workflow for batch importing electronic theses and dissertations (etds) into an eprints repository. with the first open-access, e-only issue launched, our attention will be turned to updating and improving the ital website and expanding the back content available. our goal is to have all of the back issues of both ital and its predecessor, journal of library automation (jola), openly available from the ital site. we’ll also be exploring ways to better integrate the italica blog and the ital preprints site with the main site. suggestions and feedback are welcome, at the e-mail address below. bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. perceived quality of whatsapp reference service: a quantitative study from user perspectives article perceived quality of whatsapp reference service a quantitative study from user perspectives yan guo, apple hiu ching lam, dickson k. w. chiu, and kevin k. w. ho information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14325 yan guo (kguo@connect.hku.hk) is msc(lim) graduate, faculty of education, the university of hong kong. apple hiu ching lam (applelamwork@gmail.com) is edd candidate/msc(lim) graduate, faculty of education, the university of hong kong. dickson k. w. chiu (dicksonchiu@ieee.org) is lecturer, faculty of education, the university of hong kong. kevin k. w. ho (ho.kevin.ge@u.tsukuba.ac.jp) is professor of management information systems, graduate school of business sciences, humanities and social sciences, university of tsukuba. © 2022. abstract academic libraries are experiencing significant changes and making efforts to deliver their service in the digital environment. libraries are transforming from being places for reading to extensions of the classroom and learning spaces. due to the globalized digital environment and intense competition, libraries are trying to improve their service quality through various evaluations. as reference service is crucial to users, this study explores user satisfaction towards the reference service through whatsapp, a social media instant messenger, at a major university in hong kong and discusses the correlation between the satisfaction rating and three variables. suggestions and recommendations are raised for future improvements. the study also sheds light on the usage of reference services through instant messaging in other academic libraries. introduction due to the advancement of new technologies and mobile devices, library resources and services are more accessible.1 apart from independent searching strategies, the interactions between librarians and users have become an effective method to solve user problems, referred to as reference services.2 according to the reference and user services association (rusa), reference services include creating, managing, and assessing reference transactions and activities. 3 with the increasing user needs, reference services have become an essential part of library services and commonplace in academic libraries.4 further, technology development requires reference librarians to possess updated skills, willingness, and interest to deal with user inquiries.5 recently, due to the covid-19 pandemic, users have increasingly utilized virtual reference services to help them obtain information required for their academic studies instead of face-to-face modes.6 some libraries have employed different virtual tools, for example, instant messaging services, to provide reference services to their users. one of the most popular global instant messaging services is whatsapp.7 referring to the digital 2022—hong kong report, the most-used social media platform among internet users aged 16 to 64 in hong kong was whatsapp (84.3%), followed by facebook (83.7%), instagram (65.6%), wechat (55.2%), and facebook messenger (50.4%).8 the popularity of whatsapp in hong kong accordingly increases whatsapp reference service usage in academic libraries. the qualitative study by tsang and chiu has identified whatsapp as one of the most commonly-used and relatively preferred reference services of an academic library in hong kong.9 mailto:kguo@connect.hku.hk mailto:applelamwork@gmail.com mailto:dicksonchiu@ieee.org mailto:ho.kevin.ge@u.tsukuba.ac.jp information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 2 many studies have investigated reference service quality with measurements such as satisfaction rating, perceived gaps in reference services ability to meet user expectations, and other information-seeking behaviors. however, few studies focus on instant messaging reference services compared to traditional services, except for a notable recent qualitative study by tsang and chiu.10 therefore, this research aims to quantitatively evaluate user satisfaction with whatsapp’s application in reference service of a major university’s library in hong kong through three dimensions: affect of service (as), information control (ic), and library as place (lp), which are detailed in the research purpose section. the results can help librarians better understand the effectiveness of applying whatsapp and other instant messaging to improve reference service quality. by expanding technology-based services, libraries can become more competitive in the digital era and provide better user experiences in the future. thus, this study deals with the following four research questions (rqs): rq1. what is the users’ awareness level of the library’s reference services? rq2. how do users evaluate the whatsapp application in the library? rq3. what are the relationships between user satisfaction and the three service dimensions as, ic, and lp? rq4. how can academic libraries increase user satisfaction with whatsapp reference services? literature review in the late 1800s, library leaders started to pay attention to the importance of reference services. 11 since then, reference services have also caught the public’s attention and were introduced into public libraries. reference services can assist readers in solving problems through various interactions between users and staff.12 currently, the library is not merely a repository of collections, and librarians can provide more help, particularly fulfilling users’ various information needs rather than just offering directions or physical locations of books.13 nowadays, librarians strive to solve various user problems and inquiries with their professional skills and information literacy.14 at first, in-person and telephone were the most common ways for reference services. however, with the increasing number of remote users and ubiquitous internet connectivity, face-to-face reference and asynchronous emails can no longer satisfy users’ needs.15 thus, libraries increasingly explore collaborative software and mobile applications such as instant messaging, online chatting, video sessions, and other methods to serve users, referred to collectively as virtual reference.16 virtual reference occurs electronically in real time, where users may interact with librarians through smartphones, computers, or other devices without physical presence.17 as libraries began to use the internet, several case studies investigated instant messaging reference services in academic libraries.18 at the same time, librarians and researchers began to investigate reference service quality with designated measurements. various indicators can help measure user satisfaction levels, such as accuracy, communication skills, user satisfaction, instruction, and user’s willingness to return.19 although these indicators were originally developed for physical reference services, most principles and methods can still be applied to virtual reference services, as instant messaging has become one of the most frequently used information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 3 channels.20 some studies have confirmed the effectiveness of instant messaging for reference services for more traditional means, such as phone and email.21 as one of the most popular social media chatting software platforms, whatsapp has become a powerful tool for connecting librarians and users. a primary difference from traditional phonebased reference services is that whatsapp can share texts, images, documents, and videos (and their links) at a low cost.22 whatsapp can run as a mobile application on smartphones or as a web page on desktop browsers named whatsapp web. whatsapp web users are required to use their mobile phone to scan the qr (quick response) code on the computer browser (https://web.whatsapp.com/) for authentication before use. as the functionality of whatsapp web is similar to whatsapp, users can easily adapt to whatsapp web on desktop computers. as of march 2020, the number of active whatsapp users has globally increased to approximately 2 billion and is still growing steadily.23 whatsapp, by april 2021, had become the most popular messaging application based on the number of monthly active users, compared with other popular messaging applications.24 studies also indicate that students may use whatsapp for two to three hours daily.25 although the essential chat functions of whatsapp are similar to other instant messaging services such as facebook messenger and wechat, whatsapp and wechat have been more popular for hongkongers and mainland chinese, respectively.26 surprisingly, howard et al. studied students’ habits of using social media platforms at purdue university in the us and revealed that respondents rarely use whatsapp in their daily lives, indicating that residents in different regions may have different social media platform preferences.27 recently, odu and omini have demonstrated a significant relationship between using whatsapp and library service satisfaction from the student’s perspective.28 some studies also stressed that many students welcome whatsapp as an effective reference service platform.29 however, friday et al. pointed out that some librarians might not be trained and equipped with proper and up -todate skills in using social media tools to provide library services effectively and efficiently.30 further, aina, babalola, and oduwole argued that hurdles such as instructional policies, lack of time, and heavy workloads might cause difficulties in using these tools to provide library services.31 as for evaluation, mohd azmi, noorhidawati, and yanti idaya aspura applied rusa’s guidelines for the behavioral performance of reference and information service providers to evaluate the perceived importance versus actual practices of whatsapp reference service from librarians’ perspectives.32 they suggested that although librarians expressed their awareness of the importance of rusa guidelines, they would not fully comply with the guidelines because of time and other constraints. yet, few studies deal with the satisfaction with whatsapp reference services of academic libraries from user perspectives. research purpose regarding whatsapp and library services, a few studies focused on finding the relationship between whatsapp and service usage, user attitudes toward whatsapp applications, and the difficulties of using whatsapp, particularly for reference services.33 though mohd azmi, noorhidawati, and yanti idaya aspura evaluated librarians’ behavioral performance in providing whatsapp reference service, it was from librarians’ perspectives instead of users.34 https://web.whatsapp.com/ information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 4 thus, we studied user satisfaction with the whatsapp reference service offered by academic libraries by adapting tsui’s instrument to develop the survey framework.35 tsui employed three key indicators, i.e., affect of service (as), information control (ic), and library as place (lp), in libqual+, an online library assessment tool developed by the association of research libraries (arl).36 as measures “empathy, responsiveness, assurance, and reliability of library employees,”37 like librarian-user interactions concerning the librarians’ knowledge in inquiry responses and the level of reference service provided.38 ic measures “how users want to interact with the modern library and include scope, timeliness and convenience, ease of navigation, modern equipment, and self-reliance,”39 such as library resource availability and accessibility from user perspectives. 40 lp measures “the usefulness of space, the symbolic value of the library, and the library as a refuge for work or study,”41 such as the availability of adequate facilities and an appropriate physical environment from user perspectives.42 the application of these indicators will be further discussed in the methodology section. methodology this study chose a major academic library in hong kong with a long track record of technological advancement. a reference desk is situated near the library’s main entrance for traditional services. the library’s web page shows a clear ask a librarian column with diversified methods for reference services, including email, telephone, whatsapp, and other electronic devices to access the reference services. notably, the whatsapp reference service is operated the same as other channels, available monday to friday from 9 am to 5 pm (except on public holidays). the library promises an inquiry response in no more than four hours. the mission and vision of such reference services are to • help locate information resources; • assist in searching strategies and research; • deal with queries about the use of facilities and services; and • equip users with information literacy. the library has developed a whatsapp business account with a mobile phone in the whatsapp business application and uses the whatsapp web function to handle user inquiries on desktop computers. one to two library assistants support the whatsapp reference service on shift seamlessly from 9 am to 5 pm, including the lunch break, and a professional librarian reviews whatsapp inquiry records weekly. this study used a survey administered through google form, both online and offline, to collect user perceptions about the library’s whatsapp reference service. online methods for collecting survey responses included email, facebook, wechat, and whatsapp, and offline methods included site delivery at the library entrance and sticking the survey qr code on public notice boards. no incentives were provided for the voluntary data collection. the data collected comprised mainly undergraduate and postgraduate students to represent a general user view of the whatsapp reference service. microsoft excel and ibm spss statistics were used to analyze the data, including bivariate correlation for investigating the relationships between whatsapp satisfaction and the three variables based on tsui’s study, as, ic, and lp.43 among these indicators, as focuses on whether whatsapp is easy to use and supportive; ic evaluates the response speed, accuracy, and accessibility of the whatsapp reference service; and lp measures the staff attitude and whether whatsapp helps encourage librarian-user information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 5 communication. the survey also includes demographic information, reference services usage, and user satisfaction with the whatsapp reference service. participants were asked to evaluate the quality of whatsapp reference service from these three dimensions through five-point likert scales in the satisfaction rating part. finally, the survey asked for the overall satisfaction and other useful comments about the reference service. data analysis demographic information as the main analysis of this study is regression analysis, a check on the minimum number of participants needed for analysis was performed. as explained later in this paper, the regression involved six predictors of satisfaction. using medium effect size and 0.8 as the statistical power, the minimum sample size should be 97 using an online a-priori sample size calculator for multiple regression (https://www.danielsoper.com/statcalc/calculator.aspx?id=1). the data collection yielded 131 completed responses, with 66% of master’s students and the rest undergraduates. respondents had diversified academic backgrounds, including education (26.0%), science (14.5%), business and economics (13.0%), engineering (12.2%), liberal arts (10.7%), social science (9.9%), architecture (9.2%), and legal studies (4.6%). for the time spent on instant messaging such as whatsapp and wechat, 39% spent more than three to five hours every day, while one-fifth of them would spend one to two hours. 22% of respondents spend five hours or above, and only a small portion of them (19%) would spend less than an hour. in summary, most respondents would spend at least one hour on instant messaging daily. usage of reference service table 1 summarizes respondents’ usage of reference services with a five-point likert scale (1 = never; 5 = always). as shown, walk-in and email are the most common methods to use the reference service, while whatsapp is the least frequent. when it comes to the purposes of using reference services (see table 2), databases and e-resources and identifying information sources are the two most common purposes for respondents, followed by service and facility and research assistance. table 1. usage frequency of reference service through different methods (n = 131) methods walk-in email phone whatsapp mean score 3.23 3.24 2.53 2.32 note: 1=never; 5=always table 2. purposes of using reference service (n = 131) purposes service and facility database and eresources identify information sources research assistance (individual/ group) other mean score 3.10 3.36 3.28 3.08 2.31 note: 1=never; 5=always https://www.danielsoper.com/statcalc/calculator.aspx?id=1 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 6 when asked about their preferred way to use reference services, more than half of the respondents said they would use email (67.9%), followed by walk-in (59.5%), whatsapp (23.7%), and phone (12.2%). as the traditional method, most respondents considered walk-in, in-person reference the most effective reference method because users could receive instant help from librarians, especially for urgent and complex problems. however, results indicated that despite a gap in users seeking reference services help by instant messaging and email, this gap is smaller than that for face-to-face and telephone.44 the user ratings for reference services through different methods were compared using anova. our result shows a significant result, f(3, 520) = 30.52, p < 0.01. walk-in (m = 4.18, sd = 0.71) is the most satisfying method. post-hoc tests showed the rating of email (m = 3.78, sd = 0.60) and phone (m = 3.73, sd = 0.83) statistically indifferent and lower than walk-in, while both were considered better than whatsapp (m = 3.27, sd = 0.90). apart from these ratings, respondents were also asked to leave a few comments and suggestions for the reference service. notably, most respondents showed a positive attitude to the whatsapp reference service while suggesting some improvements. for example, one respondent requested “longer office hours for whatsapp.” at the time of this research, the whatsapp reference service hours were monday to friday from 9 am to 5 pm, while in-person reference service reference hours were monday to friday, 8:30 am to 7 pm, and saturdays from 8:30 am to 7 pm. therefore, the library should extend the whatsapp service hours to provide more flexible service time, aligning with the findings of tsang and chiu.45 further, a respondent suggested that librarians should “respond to email more efficiently.” for this issue, whatsapp could serve to expand user access to reference services instead of emails. users’ satisfaction with whatsapp reference service prior research reported that as, ic, and lp influenced user satisfaction. this study adapted the instrument developed in tsui’s prior research (see appendix) to collect data to investigate these relationships.46 as the cronbach’s alpha values for all three constructs are higher than 0.7, it is valid to use the average value of these items for our data analysis.47 table 3 shows the analysis of whether respondents’ academic level would affect as, ic, lp, and overall user satisfaction with whatsapp using anova. results indicated that academic level affected as but not the other factors and satisfaction. further, multiple regression results indicated that ic and lp affected whatsapp satisfaction. table 4 tabulates our findings. table 3. anova results overall undergraduate (n = 45) master’s student (n = 86) f-value affect of service (as) 3.380 3.200 3.474 5.712 * information control (ic) 3.202 3.162 3.223 0.286 library as place (lp) 3.645 3.550 3.695 0.273 whatsapp satisfaction (sat) 3.275 3.200 3.314 0.495 notes: *** p < 0.001; ** p < 0.01; * p < 0.05. information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 7 as shown in table 4, ic and lp have significant positive impacts on users’ satisfaction with using whatsapp for reference services. however, considering the academic level (undergraduate = 0; master’s student = 1) in our regression model (i.e., interaction effect), the following effects are notable. first, as does not affect user satisfaction with using whatsapp for reference services for undergraduates but positively affects master’s respondents. second, ic positively impacts satisfaction for both undergraduate and master’s respondents, of which the difference between these two respondent groups is statistically insignificant. lastly, even though lp also positively impacts satisfaction for both groups, the effect is higher for undergraduates than for master ’s respondents. the different learning needs of the groups may explain such differences, as shown in table 5.48 table 4. regression analysis main effect interaction effect independent variables coefficient t-value coefficient t-value affect of service (as) 0.0933 0.7556 −0.3503 −1.744 information control (ic) 0.6718 6.736 *** 0.7624 4.178 *** library as place (lp) 0.6092 6.143 *** 0.9366 6.335 *** as  academic 0.7817 3.076 *** ic  academic −0.1741 −0.8701 lp  academic −0.5721 −0.3178 *** intercept −1.412 −3.748 *** −1.423 −3.876 *** r2(adj). 0.5444 0.5742 f-value 52.78 *** 30.21 *** notes: *** p < 0.001; ** p < 0.01; * p < 0.05 table 5. impacts of as, ic, and lp on different student groups undergraduate master’s student affect of service (as) not significant 0.7817 information control (ic) 0.7624 0.7624 library as place (lp) 0.9366 0.3645 discussions and recommendations subdivision of the whatsapp reference service into specialist subjects our findings indicated that as had the strongest correlation with whatsapp satisfaction for master’s students, while the as part had the lowest satisfaction with undergraduate students. this reflected that respondents who are undergraduates could not receive adequate supportive help from librarians through whatsapp, aligning with the findings of tsang and chiu.49 a possible reason is that the number of whatsapp reference librarians with specialist subject knowledge was information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 8 small. yet, one general reference whatsapp number on the library website is insufficient compared to other methods, as the library website shows seven telephone numbers of branch libraries to serve different patrons. through different numbers, users could easily find the required experts accordingly. the whatsapp reference service had only a single number probably because users mostly ask basic and general questions. such a process would cost professionals too much time and energy to deal with.50 to further enhance the service, it is necessary to reform the operational policies and add a few more whatsapp accounts, for instance, creating a whatsapp business account for each branch library (a total of six branch libraries) or for each school serving users in different disciplines to connect to corresponding subject librarians via specialized whatsapp accounts.51 this approach can separate users from the general inquiry number dealing with quick and straightforward information inquiries from those requiring specific domain inquiries.52 further, the general inquiry whatsapp service should be extended to cater to various students’ needs by possibly improving to provide 24-hour service.53 to remedy human resources requirements, student helpers, interns, and volunteers can serve on shifts on saturdays, sundays, and even public holidays.54 more users may seek troubleshooting services during the holidays, especially long holidays, and recently, under the covid-19 pandemic and its associated isolation requirements.55 more staff training due to the whatsapp reference service features, the skills required for online and face-to-face conversations are different, e.g., it is difficult to convey emotions like facial expressions and body language online.56 further, due to the limited interactions between librarians and users and the lack of visual and audio cues through the whatsapp reference service, librarians can hardly identify user needs in a short time.57 therefore, librarians may need further professional training for such scenarios, particularly in answering questions quickly and precisely in real-time chat, because users tend to be more impatient during a chat engagement.58 in addition, unlike face-toface inquiry, some complex issues often cannot be adequately explained through whatsapp. therefore, librarians should make appropriate referrals if some problems cannot be solved through whatsapp. reference services through video-based platforms such as zoom can also help.59 regular training could offer librarians updated information on using the tool and refresh the skills used in responding to the whatsapp reference service among various staff members, i.e., librarians, library assistants, student helpers, volunteers, and interns. if the library staff does not acquire well-developed skills and competencies in texting, comprehension, and communication specialized in instant messaging services, they cannot efficiently and effectively understand the inquiries and search, locate, explain, and convey the appropriate information resources to users on the asynchronous whatsapp reference service in a shorter response time.60 establishment of whatsapp reference service guidelines whether the whatsapp reference service increases the capability to deal with user problems, it still relies on consistently favorable reference behaviors.61 mohd azmi, noorhidawati, and yanti idaya aspura pointed out that users need timely responses and friendly online contacts from librarians, though librarians might not completely follow the rusa guidelines due to human resources constraints.62 therefore, libraries should establish easy-to-follow, concise, and information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 9 whatsapp-tailored guidelines for appropriately conducting whatsapp reference services, especially because such skills differ from face-to-face services, as discussed above.63 the studied library has developed a simple series of internal work procedures for using whatsapp web, including how to open and close whatsapp web and how on-duty staff should handle inquiries. to enhance and standardize the whatsapp reference service, the library should develop guidelines by offering some polite, brief, and interactive text templates for answering inquiries, such as “i am (name) (job title). what may i help you with, dear user?”, as well as answers to frequently asked questions. progress reporting messages should be sent to users to acknowledge their searching status.64 the relationship between librarians and users can thus be enhanced by creating a consistent, friendly, and warm atmosphere, using informal conversation and emojis, and incorporating whatsapp’s features to engage users and establish continued service use.65 further, such guidelines can save time and energy in training new staff and provide the basis for the future development of artificial intelligence aids such as chatbots.66 promotion for the whatsapp service most respondents conveyed a positive attitude, considering whatsapp a convenient way to access the reference service, which is in line with the studies by ansari and tripathi and sudharani and nagaraju.67 however, it is still not the most frequently used and preferred method in the library. one reason is that users still need help with physical materials and ask for the answers face-toface. 68 however, this is not the only reason, as many studies showed that library promotional efforts are often weak.69 in addition to the traditional promotional materials such as leaflets and contact cards with whatsapp numbers, the library can also broaden the promotion through massive emails and social media such as facebook, instagram, linkedin, twitter, and signal.70 in the information era, social media is an effective and efficient channel for reaching the target audience and disseminating information in an accessible way. 71 the library should reform the webpage of the whatsapp reference service to further attract users. for instance, displaying some sample whatsapp chat screenshots of librarian-user interactions on the library’s website can increase the attractiveness of the service as images can graphically represent the application’s ease of use for library reference help.72 conclusion the study has investigated user satisfaction with the whatsapp reference service in a major academic library in hong kong and explored the correlations between whatsapp satisfaction with three quality dimensions as, ic, and lp. the survey revealed various opinions toward using reference services and preference methods, including inconsistencies between users’ frequently used methods and preferred methods. moreover, by analyzing the correlation between whatsapp satisfaction and the three variables, results showed that users emphasized the whatsapp reference service. the results have led to some practical suggestions for improvement: subdividing the whatsapp reference service with subject specialists, providing more staff training, establishing staff guidelines and policies, and increasing whatsapp service promotion. limitations and future research there are still some limitations to the study. firstly, the survey only collected limited complete responses, which may not represent all users’ views. additionally, the perceptions of both library information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 10 staff and users should be considered. secondly, the research evaluation design with three dimensions can be extended to measure other quality and effects. thirdly, as whatsapp is just one application among various emerging instant-messaging tools, further studies should cover other instant messaging platforms for similar and different purposes. for instance, as the studied university comprises a significant student population from mainland china, wechat could be investigated for its possibility and effectiveness as a whatsapp alternative for providing reference services and promotion to chinese students.73 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 11 appendix: key survey items item mean sd. affect of service (ease of use, supportive) (as) (cronbach’s alpha  = 0.799) as1. there is a clear introduction teaching library users about how to use the whatsapp function. 3.18 0.87 as2. the directories are easy to understand. 3.16 0.85 as3. reference service through whatsapp is easy to use. 3.69 0.94 as4. i can receive instant help from a librarian through whatsapp. 3.55 0.78 as5. i can request service anytime, anywhere. 3.31 0.93 information control (response speed, accuracy, accessible) (ic) (cronbach’s alpha  = 0.707) ic1. response of inquiry is reliable. 3.66 0.74 ic2. whatsapp application makes reference services easily accessible for users. 3.25 0.95 ic3. response of inquiry is accurate. 3.87 0.66 ic4. using whatsapp to gain access to reference services can meet my needs. 3.73 0.95 ic5. the quality of response obtained through whatsapp is inferior to walk-in. (r). 2.60 1.26 ic6. the quality of response obtained through whatsapp is inferior to email (r). 2.66 1.17 ic7. the quality of response obtained through whatsapp is inferior to phone (r). 2.63 0.98 library as place (staff attitude, encourage communication) (lp) (cronbach’s alpha  = 0.843) lp1. reference staff is friendly or pleasant 4.08 0.76 lp2. using whatsapp to contact a librarian is convenient. 3.90 0.78 lp3. whatsapp application in reference service increases my productivity in using online library services. 3.53 0.99 lp4. it provides an efficient channel to communicate with librarians. 3.87 0.89 lp5. i request more reference services after i know about the whatsapp channel. 3.28 0.86 note: ic5, ic6, and ic7 are reversed codes. information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 12 endnotes 1 karen hiu tung yip, patrick lo, kevin k. w. ho, and dickson k. w. chiu, “adoption of mobile library apps as learning tools in higher education: a tale between hong kong and japan ,” online information review 45, no. 2 (2020): 389–405, https://doi.org/10.1108/oir-07-20200287; ken yiu kwan fan, patrick lo, kevin k. w. ho, stuart so, dickson k. w. chiu, and eddie h. t. ko, “exploring the mobile learning needs amongst performing arts students,” information discovery and delivery 48, no. 2 (2020), 103–12, https://doi.org/10.1108/idd-12-2019-0085; vanessa hiu ying chan, dickson k. w. chiu, and kevin k. w. ho, “mediating effects on the relationship between perceived service quality and public library app loyalty during the covid-19 era,” journal of retailing and consumer services 67 (2022): 102960, https://doi.org/10.1016/j.jretconser.2022.102960. 2 samuel s. green, “personal relations between librarians and readers,” library journal 1, no. 2 (1876): 74–81. 3 “measuring and assessing reference services and resources: a guide,” reference and user services association, accessed july 25, 2021, http://www.ala.org/rusa/sections/rss/rsssection/rsscomm/evaluationofref/measrefguide. 4 angel lok yi tsang and dickson k. w. chiu, “effectiveness of virtual reference services in academic libraries: a qualitative study based on the 5e learning model,” the journal of academic librarianship 48, no. 4 (2022): 102533; kun zhang and peixin lu, “what are the key indicators for evaluating the service satisfaction of wechat official accounts in chinese academic libraries?,” library hi tech, (2022), ahead-of-print, https://doi.org/10.1108/lht07-2021-0218; yifei zhang, patrick lo, stuart so, and dickson k. w. chiu, “relating library user education to business students’ information needs and learning practices: a comparative study,” reference services review 48, no. 4 (2020): 537–58, https://doi.org/10.1108/rsr-12-2019-0084. 5 andrew chean yang yew, dickson k. w. chiu, yuriko nakamura, and king kwan li, “a quantitative review of lis programs accredited by ala and cilip under contemporary technology advancement,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht-12-2021-0442; james friday, oluchi chidozie, and lauretta ngozi chukwuma, “social media and library services: a case of covid-19 pandemic era,” international journal of research and review 7, no. 10 (2020): 230–37, https://www.ijrrjournal.com/ijrr_vol.7_issue.10_oct2020/abstract_ijrr0031.html. 6 ruth sara connell, lisa c. wallis, and david comeaux, “the impact of covid-19 on the use of academic library resources,” information technology and libraries 40, no. 2 (2021): 1–20, https://doi.org/10.6017/ital.v40i2.12629. 7 “digital 2022: global overview report,” we are social and hootsuite, accessed april 30, 2022, https://wearesocial.com/hk/blog/2022/01/digital-2022/; tsang and chiu, “effectiveness of virtual reference services.” 8 “digital 2022—hong kong,” we are social and hootsuite, accessed april 30, 2022, https://wearesocial.com/hk/blog/2022/01/digital-2022/. https://doi.org/10.1108/oir-07-2020-0287 https://doi.org/10.1108/oir-07-2020-0287 https://doi.org/10.1108/idd-12-2019-0085 https://doi.org/10.1016/j.jretconser.2022.102960 http://www.ala.org/rusa/sections/rss/rsssection/rsscomm/evaluationofref/measrefguide https://doi.org/10.1108/lht-07-2021-0218 https://doi.org/10.1108/lht-07-2021-0218 https://www.emerald.com/insight/search?q=yifei%20zhang https://www.emerald.com/insight/search?q=patrick%20lo https://www.emerald.com/insight/search?q=stuart%20so https://www.emerald.com/insight/search?q=dickson%20k.w.%20chiu https://doi.org/10.1108/rsr-12-2019-0084 https://doi.org/10.1108/lht-12-2021-0442 https://www.ijrrjournal.com/ijrr_vol.7_issue.10_oct2020/abstract_ijrr0031.html https://doi.org/10.6017/ital.v40i2.12629 https://wearesocial.com/hk/blog/2022/01/digital-2022/ https://wearesocial.com/hk/blog/2022/01/digital-2022/ information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 13 9 tsang and chiu, “effectiveness of virtual reference services.” 10 tsang and chiu, “effectiveness of virtual reference services.” 11 green, “personal relations.” 12 green, “personal relations.” 13 p. sankar and e. s. kavitha, “ask librarian to whatsapp librarian: reengineering of traditional library services,” international journal of information sources and services 3, no. 2 (march– april 2016): 35–40, https://www.researchgate.net/profile/drkavithaes/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditio nal_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapplibrarian-reengineering-of-traditional-library-services.pdf; spear wing sze wong and dickson k. w. chiu, “re-examining the value of remote academic library storage in the mobile digital age: a comparative study,” portal 23, no. 1 (2023), in press; tin nok leung, dickson k. w. chiu, kevin k. w. ho, and canon k. l. luk, “user perceptions, academic library usage and social capital: a correlation analysis under covid-19 after library renovation,” library hi tech 40, no. 2 (2021): 304–22, https://doi.org/10.1108/lht-04-2021-0122. 14 syeda hina batool, ata ur rehman, and imran sulehri, “the current situation of information literacy education and curriculum design in pakistan: a discovery using delphi method,” library hi tech (2021): ahead of print, https://doi.org/10.1108/lht-02-2021-0056; yew et al., “quantitative review of lis programs.” 15 tsang and chiu, “effectiveness of virtual reference services”; zhang and lu, “what are the key indicators.” 16 james ogom odu, and emmanuel ubi omini, “mobile phone applications and the utilization of library services in the university of calabar library, calabar, nigeria,” global journal of educational research 16, no. 2 (2017): 111–19, https://doi.org/10.4314/gjedr.v16i2.5. 17 “guidelines for behavioral performance of reference and information service providers,” american library association, june 2004, http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentd isplay.cfm&contentid=26937. 18 marianne foley, “instant messaging reference in an academic library: a case study,” college & research libraries 63, no. 1 (2002): 36–45, https://doi.org/10.5860/crl.63.1.36; tsang and chiu, “effectiveness of virtual reference services.” 19 chun-wai tsui, “a study on service quality gap in remote service delivery with mobile devices among academic libraries in hong kong,” (master’s thesis, the university of hong kong, 2015), https://doi.org/10.5353/th_b5611574; leung et al., “user perceptions”; zhang and lu, “what are the key indicators.” 20 tsang and chiu, “effectiveness of virtual reference services.” https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://www.researchgate.net/profile/drkavitha-es/publication/304466788_ask_librarian_to_whatsapp_librarian_reengineering_of_traditional_library_services/links/5770958c08ae10de639c0ca3/ask-librarian-to-whatsapp-librarian-reengineering-of-traditional-library-services.pdf https://doi.org/10.1108/lht-04-2021-0122 https://doi.org/10.1108/lht-02-2021-0056 https://doi.org/10.4314/gjedr.v16i2.5 http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentdisplay.cfm&contentid=26937 http://www.ala.org/template.cfm?section=home&template=/contentmanagement/contentdisplay.cfm&contentid=26937 https://doi.org/10.5860/crl.63.1.36 https://doi.org/10.5353/th_b5611574 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 14 21 charlotte clements, “implementing instant messaging in four university libraries,” library hi tech 27, no. 3 (2009): 393–402, https://doi.org/10.1108/07378830910988522. 22 gunnan dong et al., “relationships between research supervisors and students from coursework-based master’s degrees: information usage under social media,” information discovery and delivery 49, no. 4 (2021): 319–27, https://doi.org/10.1108/idd-08-2020-0100; tsang and chiu, “effectiveness of virtual reference services.” 23 “number of monthly active whatsapp users worldwide 2013–2020,” statista research department, accessed july 26, 2021, https://www.statista.com/statistics/260819/number-ofmonthly-active-whatsapp-users/. 24 “most popular global mobile messaging apps 2021,” statista research department, accessed july 26, 2021, https://www.statista.com/statistics/258749/most-popular-global-mobilemessenger-apps/. 25 mohd shoaib ansari and aditya tripathi, “use of whatsapp for effective delivery of library and information services,” desidoc journal of library & information technology 37, no. 5 (2017): 360–65, https://doi.org/10.14429/djlit.37.5.11090; y. sudharani and k. nagaraju, “whatsapp usage among the students of svu college of engineering, tirupathi,” journal of advances in library and information science 5, no. 4 (2016): 325–29, https://jalis.in/pdf/5-4/nagaraju.pdf. 26 jianhua xu, qi kang, zhiqiang song, and christopher peter clarke, “applications of mobile social media: wechat among academic libraries in china,” the journal of academic librarianship 41, no. 1 (2015): 21–30, https://doi.org/10.1016/j.acalib.2014.10.012; tsang and chiu, “effectiveness of virtual reference”; “digital 2022—hong kong,” 54; zhang and lu, “what are the key indicators.” 27 heather howard, sarah huber, lisa carter, and elizabeth moore, “academic libraries on social media: finding the students and the information they want,” information technology and libraries 37, no. 1 (2018): 8–18, https://doi.org/10.6017/ital.v37i1.10160. 28 odu and omini, “mobile phone applications.” 29 ansari and tripathi, “use of whatsapp”; sudharani and nagaraju, “whatsapp usage.” 30 friday, chidozie, and chukwuma, “social media and library services.” 31 adebowale japhet aina, yemisi tomilola babalola, and adebambo adewale oduwole, “use of web 2.0 tools and services by the library professionals in lagos state tertiary institution libraries: a study,” world digital libraries – an international journal 12, no.1 (2019): 1–17, https://content.iospress.com/articles/world-digital-libraries-an-internationaljournal/wdl12101. 32 nor azilawati mohd azmi, a. noorhidawati, and m. k. yanti idaya aspura, “librarians’ behavioral performance on chat reference service in academic libraries: perceived importance vs actual practices,” malaysian journal of library & information science 22, no. 3 (2017): 19–33, https://doi.org/10.22452/mjlis.vol22no3.2. https://doi.org/10.1108/07378830910988522 https://doi.org/10.1108/idd-08-2020-0100 https://www.statista.com/statistics/260819/number-of-monthly-active-whatsapp-users/ https://www.statista.com/statistics/260819/number-of-monthly-active-whatsapp-users/ https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/ https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/ https://doi.org/10.14429/djlit.37.5.11090 https://jalis.in/pdf/5-4/nagaraju.pdf https://doi.org/10.1016/j.acalib.2014.10.012 https://doi.org/10.6017/ital.v37i1.10160 https://content.iospress.com/articles/world-digital-libraries-an-international-journal/wdl12101 https://content.iospress.com/articles/world-digital-libraries-an-international-journal/wdl12101 https://doi.org/10.22452/mjlis.vol22no3.2 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 15 33 aina, babalola, and oduwole, “use of web 2.0 tools and services”; ansari and tripathi, “use of whatsapp”; friday, chidozie, and chukwuma, “social media and library services”; odu and omini, “mobile phone applications”; sudharani and nagaraju, “whatsapp usage.” 34 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 35 tsui, “a study on service quality gap.” 36 “what is libqual+®?,” libqual+, accessed may 1, 2022, https://www.libqual.org/home; tsui, “a study on service quality gap.” 37 jessica kayongo, and sherri jones, “faculty perception of information control using libqual+™ indicators,” the journal of academic librarianship 34, no. 2 (2008): 131, https://doi.org/10.1016/j.acalib.2007.12.002. 38 rachael kwai fun ip and christian wagner, “libqual+® as a predictor of library success: extracting new meaning through structured equation modeling,” the journal of academic librarianship 46, no. 2 (2020): 102102, https://doi.org/10.1016/j.acalib.2019.102102; selena killick, anne van weerden, and fransje van weerden, “using libqual+® to identify commonalities in customer satisfaction: the secret to success?.” performance measurement and metrics 15, no. 1/2 (2014), 23–31, https://doi.org/10.1108/pmm-04-2014-0012. 39 kayongo and jones, “faculty perception of information control,” 131. 40 ip and wagner, “libqual® as a predictor”; killick, van weerden, and van weerden, “using libqual® to identify commonalities.” 41 kayongo and jones, “faculty perception of information control,” 131. 42 ip and wagner, “libqual® as a predictor”; killick, van weerden, and van weerden, “using libqual® to identify commonalities.” 43 tsui, “a study on service quality gap.” 44 anabel quan–haase, “instant messaging on campus: use and integration in university students' everyday communication,” the information society 24, no. 2 (2008): 105–15, https://doi.org/10.1080/01972240701883955. 45 tsang and chiu, “effectiveness of virtual reference services.” 46 tsui, “a study on service quality gap.” 47 robert a. peterson, “a meta-analysis of cronbach's coefficient alpha,” journal of consumer research 21, no. 2 (1994): 381–91, https://doi.org/10.1086/209405. 48 ka po lau, dickson k. w. chiu, kevin k. w. ho, patrick lo, and eric w. k. see-to, “educational usage of mobile devices: differences between postgraduate and undergraduate students ,” the journal of academic librarianship 43, no. 3 (2017): 201–8, https://doi.org/10.1016/j.acalib.2017.03.004. https://www.libqual.org/home https://doi.org/10.1016/j.acalib.2007.12.002 https://doi.org/10.1016/j.acalib.2019.102102 https://doi.org/10.1108/pmm-04-2014-0012 https://doi.org/10.1080/01972240701883955 https://doi.org/10.1086/209405 https://doi.org/10.1016/j.acalib.2017.03.004 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 16 49 tsang and chiu, “effectiveness of virtual reference services.” 50 aina, babalola, and oduwole, “use of web 2.0 tools and services.” 51 aina, babalola, and oduwole, “use of web 2.0 tools and services.” 52 leung et al., “user perceptions”; zhang et al., “relating library user education.” 53 maggie ka yin chan, dickson k. w. chiu, and ernest tak hei lam, “effectiveness of overnight learning commons: a comparative study,” the journal of academic librarianship 46, no. 6 (2020): 102253, https://doi.org/10.1016/j.acalib.2020.102253; tsang and chiu, “effectiveness of virtual reference services.” 54 wesley wing hong cheng, ernest tak hei lam, and dickson k. w. chiu, “social media as a platform in academic library marketing: a comparative study,” the journal of academic librarianship 46, no. 5 (2020): 102188, https://doi.org/10.1016/j.acalib.2020.102188. 55 parker fruehan and diana hellyar, “expanding and improving our library's virtual chat service: discovering best practices when demand increases,” information technology and libraries 40, no. 3 (2021): 1–9, https://doi.org/10.6017/ital.v40i3.13117; pui yik yu, ernest tak hei lam, and dickson k. w. chiu, “operation management of academic libraries in hong kong under covid-19,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht10-2021-0342. 56 friday, chidozie, and chukwuma, “social media and library services.” 57 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 58 aina, babalola, and oduwole, “use of web 2.0 tools and services”; friday, chidozie, and chukwuma, “social media and library services.” 59 yu, lam, chiu, “operation management” 60 tsang and chiu, “effectiveness of virtual reference services.” 61 kirsti nilsen, “the library visit study: user experiences at the virtual reference desk,” information research 9, no. 2 (2004), paper 171, http://informationr.net/ir/92/paper171.html. 62 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 63 tsang and chiu, “effectiveness of virtual reference services.” 64 mohd azmi, noorhidawati, and yanti idaya aspura, “librarians’ behavioral performance.” 65 tsang and chiu, “effectiveness of virtual reference services.” 66 dessy harisanty et al., “leaders, practitioners and scientists’ awareness of artificial intelligence in libraries: a pilot study,” library hi tech, (2022), ahead of print, https://doi.org/10.1108/lht10-2021-0356. https://doi.org/10.1016/j.acalib.2020.102253 https://doi.org/10.1016/j.acalib.2020.102188 https://doi.org/10.6017/ital.v40i3.13117 https://doi.org/10.1108/lht-10-2021-0342 https://doi.org/10.1108/lht-10-2021-0342 http://informationr.net/ir/9-2/paper171.html http://informationr.net/ir/9-2/paper171.html https://doi.org/10.1108/lht-10-2021-0356 https://doi.org/10.1108/lht-10-2021-0356 information technology and libraries september 2022 perceived quality of whatsapp reference service | guo, lam, chiu, and ho 17 67 ansari and tripathi, “use of whatsapp”; sudharani and nagaraju, “whatsapp usage.” 68 leung et al., “user perceptions”; tsang and chiu, “effectiveness of virtual reference services.” 69 ernest tak hei lam, cheuk hang au, and dickson k. w. chiu, “analyzing the use of facebook among university libraries in hong kong,” the journal of academic librarianship 45, no. 3 (2019): 175–83, https://doi.org/10.1016/j.acalib.2019.02.007; foley, “instant messaging reference”; tsang and chiu, “effectiveness of virtual reference services.” 70 lam, au, and chiu, “analyzing the use of facebook”; tsang and chiu, “effectiveness of virtual reference services.” 71 cheng, lam, and chiu, “social media as a platform.” 72 tsang and chiu, “effectiveness of virtual reference services.” 73 apple hiu ching lam, kevin k. w. ho, and dickson k. w. chiu, “instagram for student learning and library promotions? a quantitative study using the 5e instructional model,” aslib journal of information management, (2022), in press, https://doi.org/10.1108/ajim-12-2021-0389. https://doi.org/10.1016/j.acalib.2019.02.007 https://doi.org/10.1108/ajim-12-2021-0389 abstract introduction literature review research purpose methodology data analysis demographic information usage of reference service users’ satisfaction with whatsapp reference service discussions and recommendations subdivision of the whatsapp reference service into specialist subjects more staff training establishment of whatsapp reference service guidelines promotion for the whatsapp service conclusion limitations and future research appendix: key survey items endnotes using qualtrics xm to create a point-of-use survey to assess the usability of a local implementation of primo communication using qualtrics xm to create a point-of-use survey to assess the usability of a local implementation of primo matthew black, heather ganshorn, and justine wheeler information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16475 about the authors matthew black (corresponding author: mblack@ucalgary.ca) is the director, systems and discovery, libraries & cultural resources, university of calgary. heather ganshorn (hganshor@ucalgary.ca) is a science subject librarian, libraries and cultural resources, university of calgary. justine wheeler (jwheeler@ucalgary.ca) is the assessment librarian, libraries and cultural resources, university of calgary. © 2023. submitted: may 29, 2023. accepted for publication: september 12, 2023. published 18 december 2023. abstract in 2020, libraries and cultural resources (lcr) at the university of calgary used qualtrics xm to design and pilot a point-of-use survey to collect user feedback on the usability of our implementation of primo, ex libris's web-scale discovery service. over a two-week period, users were presented with the pop-up survey while searching and asked to provide feedback. this article summarizes how we designed and implemented this point-of-use survey and the lessons learned from this project. introduction in 2018, libraries and cultural resources (lcr) at the university of calgary implemented ex libris’s primo, a web-scale discovery service.1 through an embedded search box on the library webpage (https://www.library.ucaglary.ca), users can use primo to discover and access resources from lcr’s physical and digital collections. after adopting primo, lcr wanted to assess usability of the interface and the initial decisions made around user interface display customizations. to do this, in early 2020 lcr’s primo working group piloted a point-of-use intercept survey using qualtrics xm to collect feedback from users on the success of their search experience. for this, we defined success as users’ perception that they found what they were looking for. as we were unable to find comprehensive guidance on how to set up such a survey, this article will share our process and lessons learned in the hope that others will find it useful. primo and usability ex libris designed primo “to catch up with user expectations” through implementing “contemporary user experience elements.”2 since the initial release, many academic libraries have conducted user experience studies on primo that involved recruiting users and asking them to complete specific common tasks while under observation. study designers then analyze the results for insights into user experience.3 these studies capture feedback from a sample of users in the context of a usability test environment, which is to some extent artificial. in our study, we aimed to capture authentic user feedback as they searched using primo, not as they completed predetermined tasks. in the marketing profession, this is known as an intercept survey, an online survey that is triggered during the use of a site or application and can be used to study “natural use of the product.”4 in the library context, martha kyrillidou, terry plum, and bruce thompson used this method, which they referred to as point-of-use surveys, to collect user feedback on networked electronic resources. importantly, they contended that point-of-use surveys can improve the validity and mailto:mblack@ucalgary.ca mailto:hganshor@ucalgary.ca mailto:jwheeler@ucalgary.ca https://www.library.ucaglary.ca/ information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 2 black, ganshorn, and wheeler response rates of a survey because the survey does not require users to report on “predicted, intended, or remembered use” (which can introduce error) and that proactive interception increases the number of responses and decreases the potential for bias due to nonresponse rates.5 further, jane nichols, richard stoddart, and terry reese explained how oregon state university libraries and press designed and implemented an in-house intercept survey for collection assessment and noted the in-house method required significant time and support from their developers and that they would investigate whether qualtrics could support the intercept functionality.6 using qualtrics xm to deliver a survey in early 2020, lcr formed a primo working group, which includes representative from library systems, research and learning services, public services, and collections. the group meets monthly to review primo developments and work on improving our search experience. after our initial meetings, the group began to discuss how we collect user feedback on our discovery search interface. a subgroup was formed to come up with a strategy for this assessment. in the subgroup’s initial meeting, we decided to collect user feedback on search experience by asking open-ended questions about how successful users perceived they were with their search and to ask them to identify the elements of the interface that helped or hinder their success. for this, we decided to use qualtrics xm because it is the university of calgary’s licensed survey tool and we determined it could be used to deliver the survey as a pop-up. using qualtrics xm, we first developed a test survey to understand how to set up the survey flow and how to create the pop-up to display to users. after setting this up, we shared a test version with the primo working group for feedback on the design and testing in our primo sandbox. based on this feedback, we finalized a short survey that asked users if they found the resources they were looking for and included a follow-up question about why they feel their search was successful or unsuccessful (see fig. 1). the questions were: 1. what user group do you belong to? if you belong to more than one group, choose the group that had brought you to the library today: • academic staff • support staff and management & professional staff • undergraduate student • graduate student • continuing education student • community user • other 2. did you find what you needed? • yes (conditionally linked to question 3a) • no (conditionally linked to question 3b) • somewhat (conditionally linked to question 3a and 3b) 3. • 3a. what helped you find what you needed? • 3b. why do you think your search was not successful? information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 3 black, ganshorn, and wheeler figure 1. finalized survey questions and design in qualtrics xm. after finalizing the survey, we configured the pop-up. our plan was to run the survey for a twoweek period. during this two-week window when the survey was active, users would be presented with a pop-up during their primo search session asking if they would like to provide feedback. to trigger the display of the survey in a pop-up, we used qualtrics xm’s built-in functionality to create a point-of-use intercept, a website & app feedback project (see fig. 2). figure 2. create a website & app feedback project screen. after creating the project, we needed to configure the intercept and creative. the intercept is used to define the conditions for when a user on a site or app is presented with the survey. the creative is the method for delivering the survey to the users once intercepted on the site. for this project, we decided to use the responsive dialog creative to connect users to our survey (see fig. 3). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 4 black, ganshorn, and wheeler figure 3. feedback collection method selection screen. within the creative, we were able to configure and preview the size, style, and text for the dialog and buttons; add images, such as a logo; and set the display animation (see fig. 4). figure 4. responsive dialog creative configuration screen. we configured the dialog to allow users to accept or decline our request to take the survey and branded it with our logo and colors (see fig. 5). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 5 black, ganshorn, and wheeler figure 5. responsive dialog creative with customized message, buttons, colors, and logo. once we configured the creative, we needed to set up the intercept. we set the survey as the target for the intercept and defined the targeting logic and frequency. for the targeting logic, we used the “if current url starts with” option and used the base url for our primo instance (see fig. 6). figure 6. intercept targeting logic configuration. for the frequency, we configured it to intercept users only when their mouse left the web page because we did not want the survey to intercept users before they had tried searching. this was our best approximation for ensuring the intercept timing was appropriate for the questions we were asking. we determined the other available options would display the survey too early (on load or on focus) and not have achieved a valid timing for the questions. there is also an option to use custom javascript code to set the timing, but we decided to just use the out-of-the-box options because we wanted to keep the configuration simple (see fig. 7). information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 6 black, ganshorn, and wheeler figure 7. intercept frequency configuration. to avoid frustrating users by displaying the survey too frequently, we used the “repeated display prevention for browser cookie” option. within this configuration, we set the survey to display to 100% of the users who qualify but prevented repeated display for one day. this meant users would not be asked to take the survey more than once a day. to add the intercept to primo, we copied the javascript code snippet that qualtrics xm generates and added this to the custom.js file in our primo customization package (see fig. 8). figure 8. javascript code snippet for the responsive dialog. once this code is deployed in primo, the intercept behavior can be controlled within qualtrics xm by simply activating or deactivating it within the website/app feedback project (see fig 9). figure 9. configuration page with the activation toggle highlighted. information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 7 black, ganshorn, and wheeler this was convenient because we could leave the code in the primo custom.js file and deactivate or activate in qualtrics xm as necessary. in addition, when we later updated the target of the intercept to an updated version of the survey we did not have to update the code in primo. lessons learned proactively intercepting users at point of use can result in a significant number of responses over a short period of time. as mentioned above, we initially ran the survey with the intercept for a two-week period and again, for the same period of time, during two additional semesters. between these periods, we left the survey active as a passive sidebar that users could choose to access. this passive method collected fewer responses (107) over a six-month period than the point-of-use survey (755) collected over the combined six weeks that the pop-up prompt was active. for those who are interested in running a point-of-use survey, we summarize the lessons we learned below. when designing a survey consider the following • keep the survey short and focused on a specific goal. • assess the survey questions after piloting and iterate—adjust or clarify questions if needed. when designing the point-of-use intercept consider the following • make sure the point-of-use intercept will not frustrate users by controlling how frequently the survey is presented. • make sure the timing of the pop-up dialog aligns with your goal(s) by considering: o when do your primary stakeholders most heavily use library resources? o what is the minimum number of responses you aim to collect? o how long do you need to run the survey to achieve this number? o at what point in the users’ search will the survey intercept users to ask for feedback? will this timing be at the point of use you want feedback on? provide users with options for timely support we found that users sometimes confused the feedback survey with an opportunity to report issues or ask for help. accordingly, at the end of the survey we added a closing message that provided users with options to get immediate help using chat or our faqs (see fig. 10). figure 10. survey closing message. information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 8 black, ganshorn, and wheeler conclusion for academic libraries with access via an institutional license, qualtrics xm is a flexible and relatively simple tool that can be used to gather point-of-use feedback on user experiences with library discovery services such as primo. in our case, we analyzed the results for themes and shared these with our primo working group. as a result, the group reviewed the results and came up with actionable items such as updating our display labels for facets, resource availability statements, and access and licensing. this method may also be applicable for collecting feedback on other elements of a library website, vendor platforms, or online collections. endnotes 1 athena hoeppner, “the ins and outs of evaluating web-scale discovery services,” computers in libraries, no. 3 (april 2012), https://www.infotoday.com/cilmag/apr12/hoeppner-webscale-discovery-services.shtml. 2 tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (january 11, 2008): 7–24, https://doi.org/10.1108/03074800810845976. 3 annis lee adams and margot hanson, “primo on the go: a usability study of the primo mobile interface,” journal of web librarianship 14, no. 1–2 (april 2, 2020): 1–27, https://doi.org/10.1080/19322909.2020.1784820; kelsey renee brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries 35, no. 1 (april 1, 2016): 7–25, https://doi.org/10.6017/ital.v35i1.8965; david j. comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (april 2012): 189–206, https://doi.org/10.1080/10691316.2012.695671; sarah dahlen, kenny garcia, and kathlene hanson, “comparing apples and bananas? a/b testing for discovery system optimization,” library faculty publications and presentations, january 1, 2018, https://digitalcommons.csumb.edu/lib_fac/9; blake lee galbreath, corey johnson, and erin hvizdak, “primo new user interface: usability testing and local customizations implemented in response,” information technology and libraries 37, no. 2 (june 18, 2018): 10–33, https://doi.org/10.6017/ital.v37i2.10191; scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (january 2, 2015): 1–21, https://doi.org/10.1080/19322909.2014.983259; w. jacobs, mike demars, and j. m. kimmitt, “a multi-campus usability testing study of the new primo interface,” college & undergraduate libraries 27, no. 1 (january 2, 2020): 1–16, https://doi.org/10.1080/10691316.2019.1695161; kylie jarrett, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (december 2012): 278–99; greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (january 1, 2016): 566–84, https://doi.org/10.1108/lht-05-2016-0052; aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (april 3, 2014): 172–95, https://doi.org/10.1080/19322909.2014.903133; xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30, no. 5 (may 4, 2014): 422–33, https://doi.org/10.1080/10447318.2013.873281; joy marie perrin et al., “usability testing for greater impact: a primo case study,” information technology and libraries 33, no. 4 (december 18, 2014): 57–66, https://doi.org/10.6017/ital.v33i4.5174; lynne porat and nir zinger, “primo new user interface—not just for undergrads: a usability study,” weave: https://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml https://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml https://doi.org/10.1108/03074800810845976 https://doi.org/10.1080/19322909.2020.1784820 https://doi.org/10.6017/ital.v35i1.8965 https://doi.org/10.1080/10691316.2012.695671 https://digitalcommons.csumb.edu/lib_fac/9 https://doi.org/10.6017/ital.v37i2.10191 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.1080/10691316.2019.1695161 https://doi.org/10.1108/lht-05-2016-0052 https://doi.org/10.1080/19322909.2014.903133 https://doi.org/10.1080/10447318.2013.873281 https://doi.org/10.6017/ital.v33i4.5174 information technology and libraries december 2023 using qualtrics xm to create a “point-of-use” survey 9 black, ganshorn, and wheeler journal of library user experience 1, no. 9 (2018), https://doi.org/10.3998/weave.12535642.0001.904; barbara valentine and beth west, “improving primo usability and teachability with help from the users,” journal of web librarianship 10, no. 3 (july 2, 2016): 176–96, https://doi.org/10.1080/19322909.2016.1190678. 4 christian rohrer, “when to use which user-experience research methods,” nielsen norman group, july 17, 2022, https://www.nngroup.com/articles/which-ux-research-methods/. 5 martha kyrillidou, terry plum, and bruce thompson, “evaluating usage and impact of networked electronic resources through point-of-use surveys: a mines for librariestm study,” the serials librarian 59, no. 2 (july 30, 2010): 485, https://doi.org/10.1080/03615261003674057. 6 jane nichols, richard stoddart, and terry reese, “nuanced and timely: capturing collections feedback at point of use,” in too much is not enough! (charleston conference, against the grain, 2014), 299, https://doi.org/10.5703/1288284315275. https://doi.org/10.3998/weave.12535642.0001.904 https://doi.org/10.1080/19322909.2016.1190678 https://www.nngroup.com/articles/which-ux-research-methods/ https://doi.org/10.1080/03615261003674057 https://doi.org/10.5703/1288284315275 abstract introduction primo and usability using qualtrics xm to deliver a survey lessons learned when designing a survey consider the following when designing the point-of-use intercept consider the following provide users with options for timely support conclusion endnotes product ownership of a legacy institutional repository: a case study on revitalizing an aging service article product ownership of a legacy institutional repository a case study on revitalizing an aging service mikala narlock and don brower information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13241 mikala narlock (mnarlock@nd.edu) is digital collections strategy librarian, university of notre dame. don brower (dbrower@nd.edu) is digital projects lead, university of notre dame. © 2021. abstract many academic libraries have developed and/or purchased digital systems over the years, including digital collection platforms, institutional repositories, and other online tools on which users depend. at hesburgh libraries, as with other institutions, some of these systems have aged without strong guidance and resulted in stale services and technology. this case study will explore the lengthy process of stewarding an aging service that satisfies critical external needs. starting with a brief literature review and institutional context, the authors will examine how the current product owners have embraced the role of maintainers, charting a future direction by defining a clear vision for the service, articulating firm boundaries, and prioritizing small changes. the authors will conclude by reflecting on lessons learned and discussing potential future work, both at the institutional and professional level. introduction our home-grown institutional repository (ir) began almost a decade ago with enthusiasm and promise, driven by an eagerness to meet as many use cases as possible. over time, the code grew unwieldy, personnel transitioned into new roles, and new priorities emerged, leaving few individuals to manage the repository, allocate resources, articulate priorities, or advocate for user needs. this in turn left the system underutilized and undervalued. in mid -2019, two product owners (pos) at hesburgh libraries, university of notre dame were named to oversee the service and tasked with determining how the service should continue, if at all. the pos began by evaluating the service, current commitments, and benefits, and identifying potential on-campus adopters of the service. after agreeing the service should continue, the pos started the lengthy process of turning the metaphorical ship, prioritizing modest adjustments that would have large payoffs.1 selected literature review since the 2003 seminal article by clifford lynch, much has been authored on the topic of institutional repositories as academic libraries and archives have flocked to create their own.2 a complete literature review is beyond the scope of this case study: institutional repositories have contended and continue to contend with a wide variety of challenges, including legal, ethical, and socio-technical challenges.3 while the lessons presented in this case study can apply to a wide variety of legacy services, a brief overview of some of the literature surrounding irs is crucial to understanding the challenges the authors were presented as product owners. broadly defined “as systems and service models designed to collect, organize, store, share, and preserve an institution’s digital information or knowledge assets worthy of such investment,” libraries and archives flocked to build the “essential infrastructure for scholarship in the digital mailto:mnarlock@nd.edu mailto:dbrower@nd.edu information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 2 age.”4 operating under the assumption that faculty members would flock to the service to deposit their works, irs were promised to solve many problems, including supporting open access publishing and digital asset management.5 as articulated by dorothea salo, however, the field of dreams model (“build it and they will come”) was insufficient as repositories often failed to meet changing user needs and expectations while heavily employing library jargon that was foreign to faculty members.6 moreover, as identified by kim, some irs struggle to even be known to their users, while also grappling with concerns of trust.7 other problems that have plagued repositories include limited adoption rates, restricted resources to support digitization of analog materials for faculty that operate in both analog and digital media, failing support from fellow library colleagues, and inconsistent and incomplete metadata.8 salo warned more than a decade ago that high-level library administrative support would be necessary to empower repository managers to enact lasting and substantive change, and recent studies echo these concerns.9 libraries have slowly started to serve faculty on their terms, such as by creating automated processes for populating irs, streamlining content deposits, experimenting with metadata harvesting features to provide increased access, and building more tools to integrate directly with the research lifecycle.10 however, these new technologies and services may be out of reach for many institutions. in addition to limited resources, some institutions are grappling with a legacy system that is incompatible with newer code, leaving these institutions in a feature desert, reliant on aging technology and cumbersome deposit processes.11 moreover, even in an institution where resources might be more readily available for licensing or purchasing newer technology, early forks of open-source code or otherwise deprecated components might make migration to newer platforms extremely difficult, if not impossible, without extensive infrastructure improvements. lastly, as libraries grappled with some of the issues mentioned above and options for repositories continued to proliferate, many institutions struggled to clearly articulate boundaries around their digital library holdings. confusion between digital collections, scholarly content, e-resources, and other digital materials resulted in some institutions having too many options to store content, leaving internal and external stakeholders confused as to where to discover and distribute materials; conversely, other institutions have few options, and a wide variety of content is pigeonholed imperfectly into a single repository.12 in both situations, developing repositories with vague content scopes can be exceedingly difficult, as a restrictive scope can stifle development , while an overly inclusive approach results in too many use cases and competing stakeholder interests to effectively prioritize feature development. local context our institutional repository at the university of notre dame, managed by hesburgh libraries employees, suffered from many problems that affected our locally built code: limited adoption and awareness on campus; aging technology that made adding new features a monumental, if not impossible, task; and an overly broad scope (and a simultaneous proliferation of other digital collection tools). while the detailed history of this repository is beyond the scope of this paper, a brief overview of the development provides critical context. additionally, the technical details and implementation particulars will not be discussed, as this case study transcends specific software frustrations and will resonate with many institutions regardless. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 3 in 2012, after a failed attempt to launch a repository in the early 2000’s, consortial development of our ir began in an open-source community. in 2014, an early implementation of the product was envisioned to be a unified digital library service that would provide support to many different stakeholders. this included a plan for a single location for researchers to share their scholarly work, research outputs, and research data, as well as for the university libraries to provide access to digitized rare and archival holdings. as development continued on the homegrown service, features were implemented to serve the numerous purposes mentioned above. this included components of an institutional repository, such as a self-deposit interface, customizable access levels, and a proof-of-concept researcher profile system. over time support for browsing digital collections was added, namely the development of the work type “collection,” which allowed curators to create a landing page for their collection and customize it with a representative image. development continued in a somewhat sporadic fashion, often aligning at the intersection of “what is easy?” and “what is needed?” as technical staff continued growing the open-source code. as content was added to the system, stemming from special collections, various campus partners, and electronic thesis and dissertation (etd) deposits, additional use cases emerged and were added to the scope of the repository. the system quickly grew cumbersome and difficult to work with. in short, the repository struggled with the challenges of many open-source technologies. the struggle was compounded by decreasing resources, an overly inclusive scope, limited adoption— both with external faculty as well as library faculty and staff—and consortial development that introduced features extraneous to local campus needs. while our repository did many different things, it failed to do any one well. after falling short of meeting the expectations for digital collections, particularly with regards to browsing and displaying objects, the library applied for, and received, a three-year mellon grant.13 this grant, a collaboration with the snite museum of art, university of notre dame, was initially sought to improve upon the existing repository and to build the infrastructure necessary to support the online display of collective cultural heritage materials and facilitate serendipitous discovery for patrons. however, soon into the grant, it became clear that creating an entirely new system for digital collections would be not only easier to build and maintain, but also better suited to meet the specific needs of digital collections as articulated by campus partners. first things first: what is our ir? around the same time this shift was announced, two individuals were appointed to serve as product owners (pos) of the repository. while exact duties vary between institutions, pos are responsible for liaising with users, managing the product backlog, directing development, communicating with a wide variety of stakeholders, resolving issue tickets, and guiding the overall direction of the product.14 the pos were tasked with making this amorphous, oft-critiqued service usable while dealing with uncertain resources and competing institutional priorities. with the change in grant objectives mentioned above, namely the desire to develop a new repository instead of contending with the legacy code, the option was presented to retire the repository and direct users to other systems that could sufficiently meet their needs, such as discipline specific repositories, general purpose repositories, or even online cloud storage. the pos recognized that continuing the system due solely to sunk costs was a fallacy: if the service was too cumbersome to information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 4 maintain with even nominal use, the return on investment would be abysmal and ultimately prevent the library from investing resources more appropriately. in order to evaluate the service, the pos considered active commitments and ongoing partnerships tied to the service. in particular, several centers and departments on campus had utilized the system to capture citations and demonstrate their impact. additionally, after conversations with library liaisons, it became apparent that there was great value in providing the campus with a discipline-agnostic repository that allows deposition of, provides access to, and preserves scholarly outputs that might otherwise be lost. while the pos recognized that faculty adoption or even awareness of the service was limited, they realized there were several campusspecific features that were useful to local champions, including flexible access controls at the record and file levels, as well as a customized etd workflow that served the graduate school, internal technical services, and the students and faculty required to interact with the system. acknowledging that the system and related services were still critical, the pos prioritized making sure the system remained useful: maintaining the legacy repository would cost valuable time and resources and would need to overcome the resentment that many internal stakeholders had developed over the years. after deciding the system was worth maintaining, it was necessary to explicitly narrow the scope of the service, which had broadened over time in an ad hoc manner: as other services were turned off, leaving various digital content to find a new location, our institutional repository was often leveraged to host the content, even when support for the needs of niche content was poor at best. when considering the future of the repository, several key use cases emerged, including the etd support provided to the graduate school as mentioned above. while the service had done many things acceptably, the strength was in the support for scholarship: the customized access levels, self-deposit interface, and robust preservation capabilities were frequently lauded as the highlights of the service to internal and external stakeholders. these considerations, combined with the eventual migration of digitized rare and unique materials to the new mellon-funded platform, resulted in rebranding and redefining the service as exclusively focused on scholarly outputs. with the goal of best supporting the teaching and research mission of the university, the directional force became how to (re)build the service as a trusted, useful, and integral repository for campus scholars to provide access to their research outputs. mission (and vision) critical operating under the guiding principles of usefulness, usability, and transparency, the first task after redefining and rearticulating the scope of the service was to keep the service operational. however, with the recognition that maintenance alone, while critical, would not lead to an enhanced reputation on campus, it was important to continue charting a forward direction. the product owners were given the freedom to articulate their ideal mission statement. to complement the vision of the repository as both trusted and integral, the pos further defined the mission statement in three key areas: to increase the impact of campus researchers, scholars, and departments; to advance new research by facilitating access to scholarship in all forms; and to serve as active campus partners in the research lifecycle. while these statements are far from innovative or revolutionary, it was essential for moving the service forward. in fact, these sentences were carefully crafted over the course of a month, during which time the product owners drafted the language, compared it with peer and aspirational peer information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 5 institutions, and solicited feedback from trusted internal colleagues before sharing it more broadly. this time-consuming process was critical for success, however: with the knowledge that these words would serve as the foundation for prioritizing feature requests and advocating for resources, the pos wanted to establish both the repository and themselves in their new role. this clarity in mission was also important for grappling with legacy emotional and mental frustrations that lingered towards the system, as the pos had a strong, unified foundation to advocate for resources and the service as a whole. relatedly, these mission and vision statements provided critical and consistent talking points, which were leveraged in presentations to internal stakeholders, provided to librarians as messaging for the liaison faculty, and useful in short communications to teaching professors, research faculty, and department administrato rs. clear and present boundaries in rebranding the repository, it also became clear that firm boundaries would be instrumental in attaining success. in addition to narrowly focusing feature development on supporting research and scholarly outputs, the pos also scaled back goals for adoption, intentionally excluded digital collection features, and identified features that were patently unattainable in the short term. the repository was often seen as a failure locally due to limited adoption and an incomplete record of the academic outputs of campus, reflecting concerns of irs more generally.15 combatting this narrative required a clear articulation and acceptance of the fact that the institutional repository, regardless of how seamlessly integrated or easy to use, would never be absolutely comprehensive or the authoritative record of our researchers and scholars. with limited resources and a current technical infrastructure in which it is difficult to incorporate automatic harvesting mechanisms, any effort to make the repository comprehensive would be impractical, unrealistic, and a waste of limited resources. instead, by focusing efforts on making the repository useful and refraining from being yet another requirement for an already overwhelmed faculty member or graduate student, the service can be improved to meet the unique needs of campus faculty, serving as a more viable option for those who need it.16 similarly, because there is less concern with filling the repository and increasing usage statistics and more on what the patron needs, the pos have been able to develop robust partnerships with stakeholders, leading to champions in research centers, labs, departments, and other administrative units across campus. this has helped scholars demonstrate the impact of their work, which in turn led to more partnerships with other campus centers, as champions began to advocate for the service to colleagues facing similar challenges across the university. in this way, decreasing the effort to fill the repository has actually increased holdings and driven more traffic to the site: by focusing on useful offerings and decreasing the burden on ourselves to create a comprehensive research repository, the pos have been able to prove the value of a discipline-agnostic approach to internal and external stakeholders. an additional, and extremely beneficial, boundary was intentionally excluding library-owned digital collections from the repository’s collecting and feature-development scope. the pos received little pushback from internal users on this change: the repository had been the de facto scholarly and research repository for nearly five years, as it was patently clear that supporting digital collections had been more of an afterthought, with limited features built to support curators and users in creating and interacting with rare and archival materials. in fact, internal colleagues supported this change wholeheartedly, as the pos volunteered to continue providing access to the extant digital content in the ir as the mellon grant-funded site was built. while this information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 6 direction had already been understood by individuals across the organization, it was helpful to clearly articulate the new boundaries in open forums for internal stakeholders, communication through a library-wide listserv, and repetition in smaller meetings. by articulating this new boundary clearly and repeating it frequently in different methods of communication, the pos had the authority to reject feature requests that were explicitly in support of rare and archival materials. with a clear focus on collecting and providing access to scholarly and research outputs, niche metadata fields, advanced browsing features, and robust collection landing pages were identified as unnecessary, as they were scoped for the mellon-funded platform, and internal colleagues quickly embraced this boundary. the final, crucial boundary, also related to feature requests, was to clearly define requests that were impossible to accommodate in the current technical infrastructure. as mentioned earlier, the pos focused first on maintenance: by updating code, critically evaluating the service and existing commitments, and charting a future direction, the pos could more effectively steward the project. this also meant revisiting previous feature requests, and even technical promises, in order to set more reasonable expectations on what the service would, and would not, be able to support in the coming years. with limited resources, advanced features such as research profiles—a frequent request from internal allies—was beyond the current capabilities with the aging technical stack. moreover, a feature-rich repository would be essentially useless if users’ basic expectations were left unmet: a cumbersome deposit interface, limited upload support, and confusing language throughout the site were more pressing issues, as they prevented users from even engaging with the site for any amount of time. by resolving these limitations and generating awareness of the repository, the pos could better serve not only current campus partners, but also future users, as an increase in adoption and use would lead to more resources to develop advanced features. instead of planning a new outfit for the proverbial patient, it was more important to stop the bleeding. by adopting firm boundaries, the pos were able to scope developer work, prioritize maintenance and modest feature development, and even deny implementation of previously requested features that were no longer relevant to the repository or would be unattainable in the coming years. the pos could explicitly drop support for unused services, allow other services to limp along, and improve existing strengths. this has further helped to clarify messaging about the service and garner more support from our campus partners; instead of a malleable system that fits too many roles in a limited capacity, the pos could clearly state how the repository offers support and garner users from across campus. small changes, big rewards the last critical component of rebranding and revitalizing the institutional repository was the conscious decision to implement incremental improvements instead of large, sweeping changes. in particular, there were known frustrations with the service that were easy to start working on while the product owners expanded the user base and sought additional user feedback. small changes to the user interface, including the addition of use metrics and color-coded access tags, received immediate attention and positive feedback from key stakeholders. additionally, over the numerous years of development, many projects to improve the repository had stalled for various reasons. by either prioritizing the work necessary to complete the project or accepting the sunkcosts and clearing the backlog for other projects, the technical development team could build momentum, completing projects and clearing mental space for new, exciting endeavors. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 7 with limited resources on hand, maximizing the return on investment also included an emphasis on securing and keeping internal and external champions. due to the limited outreach conducted early in the system’s existence as well as the mediocre service offerings, many campus users were unaware of the tool, and a few were using the repository in a somewhat limited fashion. in order to build support for the service, it was critical that key users of the repository received targeted support and outreach efforts. a primary example of this was an imaging facility on campus: this unit provided a critical service to campus, yet had difficulty showing the impact of their work as many faculty members did not cite their team in publications. the facility slowly began collecting citations manually, but still struggled to publicly advertise their capabilities and show the fruits of their labor. they solved this problem by loading citation records into the repository, which became the single location where any interested faculty, staff, and students could look to see the full output of the center. while they were using the repository in a somewhat different manner than anticipated, they found the system useful and were actively directing other campus centers and institutes to the repository for similar support. in conversations with them, it became clear that a few modest changes would streamline their workflows and alleviate some cumbersome burdens. with this concentrated outreach and a minimal amount of development, the repository secured a champion that continues to advocate for the service to colleagues across campus. lastly, prioritizing maintenance and paying down technical debt was critical for moving the repository forward. many software dependencies had fallen behind by several major version updates, making it difficult to add new features or consider potential migration paths to future technical solutions. while the amount of technical debt to be paid was substantial, by prioritizing a small amount of maintenance every month, the development team quickly caught up, thereby improving the overall performance of the site and providing the product owners with the flexibility to consider future technical implementations and key features to continue recruiting users. lessons learned and future work moving forward, the product owners are embracing the role of maintainers. in specific reference to repositories, that includes “repairing, caring for and documenting a wide variety of knowledge systems beyond irs to facilitate access and optimize user experience.”17 the work of critically evaluating commitments, establishing clear boundaries, and reaffirming the mission of the repository is useful on a recurring basis, and will need to be continued as the repository ages. maintaining the technical infrastructure as appropriate and conducting user experience testing to improve the service will be critical to ensuring the long-term success of the repository and the information contained therein. beyond the stewardship and small improvements required for maintaining the service, there is the opportunity to reconsider the role of the institutional repository, both at the local level and within the academic community. by prioritizing usefulness over comprehensiveness, the product owners made great strides in making the service accessible to patrons and actually usable. when considering the future of repositories, specifically through a lens of usefulness, it is critical to consider how future work will best serve faculty needs without overburdening librarians. adding pos who are examining how a service will be used and what will promote the mission of the library reframes a repository from being a piece of technology to being a source of interconnections. scholarship usually requires a level of technology different from what most campus it departments can provide: research does not usually just deal in urls, it requires dois information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 8 and persistent identifiers; files are not just backed up, but are preserved (an active process that requires consideration for how computing will change over the coming decades). not only is a library a place to go to look for data, but it is also a place that can help publish and deposit items, providing valuable services to connect researchers to tools and platforms to facilitate research. this is an area of service that libraries and repositories can provide. in the relationship between libraries and technologies, innovation and maintenance, one clear challenge was the amount of emotional labor necessary to revitalize a service. the pos spent a large portion of time apologizing for previous failures, managing expectations by scaling back previous promises, and grappling with the current technical shortcomings of the service. while this is, at least in part, the role of the pos, the phenomenon of controlling expectations and handling the emotional debt that comes with broken promises and failed technologies is not localized to hesburgh libraries. in libraries especially, this work tends to fall to women, where they are forced to be the middle ground between technology and patron-facing librarians.18 while embracing the term “product owner” has helped to make visible and valuable the labor invested, especially that which might otherwise be overlooked, libraries writ large still need to contend with the gender divide plaguing the seeming dichotomy between innovation and maintenance. 19 in fact, as libraries continue to build new technologies and support innovative research, the role of the product owners in managing legacy technologies will be crucial for success, as will embracing a culture of care and empathy. while beyond the scope of this case study, continued discussions of the gender roles often employed in library technology need to continue, especially as academic libraries embrace scrum methodology, project management, and product ownership. conclusion in this case study, the product owners of a legacy institutional repository described methods for revitalizing a service. for the institutional repository managed by hesburgh libraries, there has been a noticeable increase in usage in the past six months: more deposits, higher access counts, and more support tickets tracked. it appears the efforts of the product owners are showing results. this increased usage is one more piece of evidence that a repository is more than software and more than technology: by allowing the product owners oversight of the mission and ultimate direction of the service, not to mention the freedom to engage with users on behalf of the development team, the system is in a much better position than in previous years. despite these improvements, there is still room for growth as the pos guide the overall mission and development of the institutional repository as both a service and a system. similarly, as more institutions contend with legacy digital technology, using pos and the methods described above may prove beneficial. there is additional work to be done, such as investigating more thoroughly the role of the repository—indeed the concept of the repository—and discussions of gender norms in technology. endnotes 1 this article is based on a presentation by don brower and mikala narlock: “what to do when your repository enters middle age” (online presentation, samvera connect 2020, october 28, 2020), https://doi.org/10.7274/r0-e32v-2h81. 2 clifford lynch, “institutional repositories: essential infrastructure for scholarship in the digital age,” portal: libraries and the academy 3 (april 1, 2003): 327–36, https://doi.org/10.1353/pla.2003.0039. https://doi.org/10.7274/r0-e32v-2h81 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 9 3 soohyung joo, darra hofman, and youngseek kim, “investigation of challenges in academic institutional repositories: a survey of academic librarians,” library hi tech 37, no. 3 (january 1, 2019): 525–48, https://doi.org/10.1108/lht-12-2017-0266. 4 j. j. branin, “institutional repositories,” in encyclopedia of library and information science, ed. m. a. drake (boca raton, fl: taylor & francis group, 2005): 237–48; lynch, “institutional repositories.” 5 raym crow, “the case for institutional repositories: a sparc position paper,” arl bimonthly report 223, august 2002: 7; lynch, “institutional repositories.” 6 dorothea salo, “innkeeper at the roach motel,” december 11, 2007, https://minds.wisconsin.edu/handle/1793/22088. 7 jihyun kim, “motivations of faculty self-archiving in institutional repositories,” journal of academic librarianship 37, no. 3 (may 1, 2011): 246–54, https://doi.org/10.1016/j.acalib.2011.02.017; deborah e. keil, “research data needs from academic libraries: the perspective of a faculty researcher,” journal of library administration 54, no. 3 (april 3, 2014): 233–40, https://doi.org/10.1080/01930826.2014.915168. 8 trevor owens, “the theory and craft of digital preservation,” lis scholarship archive, july 15, 2017, https://doi.org/10.31229/osf.io/5cpjt. 9 e.g., joo, hofman, and kim, “investigation of challenges in academic institutional repositories.” 10 sarah hare and jenny hoops, “furthering open: tips for crafting an ir deposit service,” october 26, 2018, https://scholarworks.iu.edu/dspace/handle/2022/22547; james powell, martin klein, and herbert van de sompel, “autoload: a pipeline for expanding the holdings of an institutional repository enabled by resourcesync,” code4lib journal, no. 36 (april 20, 2017), https://journal.code4lib.org/articles/12427; carly dearborn, amy barton, and neal harmeyer, “the purdue university research repository: hubzero customization for dataset publication and digital preservation,” oclc systems & services, february 1, 2014, https://docs.lib.purdue.edu/lib_fsdocs/62. 11 clifford lynch, “updating the agenda for academic libraries and scholarly communications,” college & research libraries 78, no. 2 (february 2017): 126–30, https://doi.org/10.5860/crl.78.2.126. 12 lynch, “updating the agenda,” 128. 13 diane walker, “hesburgh/snite mellon grant,” october 31, 2018, https://doi.org/10.17605/osf.io/cusmx. 14 hrafnhildur sif sverrisdottir, helgi thor ingason, and haukur ingi jonasson, “the role of the product owner in scrum-comparison between theory and practices,” in “selected papers from the 27th ipma (international project management association), world congress, dubrovnik, croatia, 2013,” special issue, procedia—social and behavioral sciences, 119 (march 19, 2014): 257–67, https://doi.org/10.1016/j.sbspro.2014.03.030. https://doi.org/10.1108/lht-12-2017-0266 https://doi.org/10.1108/lht-12-2017-0266 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.31229/osf.io/5cpjt https://doi.org/10.31229/osf.io/5cpjt https://scholarworks.iu.edu/dspace/handle/2022/22547 https://scholarworks.iu.edu/dspace/handle/2022/22547 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.17605/osf.io/cusmx https://doi.org/10.1016/j.sbspro.2014.03.030 https://doi.org/10.1016/j.sbspro.2014.03.030 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 10 15 salo, “innkeeper.” 16 carolyn ten holter, “the repository, the researcher, and the ref: ‘it’s just compliance, compliance, compliance’,” journal of academic librarianship 46, no. 1 (january 1, 2020): 102079, https://doi.org/10.1016/j.acalib.2019.102079. 17 don brower et al., “on institutional repositories, ‘beyond the repository services,’ their content, maintainers, and stakeholders,” against the grain, 32 (1), https://against-thegrain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-therepository-services-their-content-maintainers-and-stakeholders/. 18 bethany nowviskie, “on capacity and care,” october 4, 2015, http://nowviskie.org/2015/oncapacity-and-care/; ruth kitchin tillman, “who’s the one left saying sorry? gender/tech/librarianship,” april 6, 2018, https://ruthtillman.com/post/whos-the-one-leftsaying-sorry-gender-tech-librarianship/. 19 dale askey and jennifer askey, “one library, two cultures” (library juice press, 2017), https://macsphere.mcmaster.ca/handle/11375/22281; rafia mirza and maura seale, “dudes code, ladies coordinate: gendered labor in digital scholarship,” october 22, 2017, https://osf.io/hj3ks/. https://doi.org/10.1016/j.acalib.2019.102079 https://doi.org/10.1016/j.acalib.2019.102079 https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://osf.io/hj3ks/ abstract introduction selected literature review local context first things first: what is our ir? mission (and vision) critical clear and present boundaries small changes, big rewards lessons learned and future work conclusion endnotes classical musicians v. copyright bots: how libraries can aid in the fight article classical musicians v. copyright bots how libraries can aid in the fight adam eric berkowitz information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14027 adam eric berkowitz (berkowitza@hcflgov.net) is supervisory librarian, tampahillsborough county public library. © 2022. abstract the covid-19 pandemic forced classical musicians to cancel in-person recitals and concerts and led to the exploration of virtual alternatives for engaging audiences. the apparent solution was to livestream and upload performances to social media websites for audiences to view, leading to income and a sustained social media presence; however, automated copyright enforcement systems add new layers of complexity because of an inability to differentiate between copyrighted content and original renditions of works from the public domain. this article summarizes the conflict automated copyright enforcement systems pose to classical musicians and suggests how libraries may employ mitigation tactics to reduce the negative impacts when uploaders are accused of copyright infringement. introduction the covid-19 pandemic, unlike anything the country has seen in a century, forced industries to reevaluate the manner in which they provide services to the public. businesses and citizens everywhere made hairpin turns as they quickly searched for virtual alternatives to everyday inperson activities. with many remaining home for extended periods of time, demand for digital content and entertainment skyrocketed. in may 2020, comcast reported a 40% increase in online video streaming since march 1, just weeks before governments instated stay-at-home mandates.1 throughout the year, subscription-based streaming services saw enormous surges in customer usage and, likewise, social media platforms saw a significant spike in content production and consumption.2 daily blogging on facebook replaced in-person interactions, and youtubers generated higher volumes of videos to meet viewer demand. classical musicians were also heavily reliant on social media platforms in order to showcase performances as pointed out in the washington post article “copyright bots and classical musicians are fighting online. the bots are winning.” highlighted by american library association’s american libraries, the article illustrated the toll social media content moderation algorithms took on classical musicians sharing their performances online.3 this article became the starting point for the 2021 study “are youtube and facebook canceling classical musicians?,” which investigated the relationship between classical musicians and automated copyright enforcement systems.4 the following is a summary of this study’s findings and brings attention to the role libraries can play in aiding classical musicians facing copyright infringement claims. automated copyright enforcement evidence shows that automated copyright enforcement systems wrongfully remove useruploaded materials in the name of copyright protections on a regular basis.5 in fact, it happens so often that the australian broadcasting corporation began wittingly dubbing such instances “copywrongs.”6 these algorithms are not designed to distinguish between recordings of music mailto:berkowitza@hcflgov.net information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 2 owned by record labels and those shared online by freelance musicians. they are instructed to recognize copyrighted recordings and content resembling those recordings as identical matches, ensuring the protection of intellectual property from unauthorized reproduction. as such, automated content moderation systems are incapable of making allowances for the performance of works from the public domain. such performances comprise nearly all of a classical musician’s repertoire. automated copyright enforcement systems are typically based on a combination of matching and classification methods. the most effective matching technique for content moderation is perceptual hashing, which isolates unique strings of data (hashes) taken from an uploaded file and compares distinguishing markers and patterns to a database of samples provided by copyright owners.7 this technique allows systems to detect exact matches and iterations of the original work, such as live recordings and remixes.8 among classification methods, artificial neural networks with deep learning are best suited to the task of algorithmic moderation. consisting of a network of nodes, they are meant to simulate the structure and function of neural networks in animals and humans.9 this enables them to solve multifaceted, dynamic problems, which makes them ideal for instantaneous content moderation, allowing them to identify musical similarities in real time.10 both youtube and facebook enable users to upload recordings and broadcast live feeds to their websites. matching techniques are used to review prerecorded content since the upload process allows for automated systems to sample the material for comparison to the companies’ hash databases before allowing the recording to be posted.11 in contrast, live broadcasts are transmitted instantaneously and allow for no time to review the footage before it is visible online. therefore, hashes cannot be sampled from streaming content, requiring that classification methods using training data identify infringing material on the fly.12 while these algorithms make content moderation easier, they are limited in their capacity. one study showed that youtube is surprisingly inaccurate in its attempts to recognize infringing material in live broadcasts, failing to identify 26% of copyrighted footage within the first thirty minutes of streaming and blocking 22% of non-infringing livestreams.13 research strongly suggests that the only factors considered by music copyright enforcement systems are pitch, volume, and melodic and harmonic contour.14 those values alone cannot be used to distinguish copyrighted works from the public domain. as such, these systems are not yet advanced enough to account for the total complexity of human creativity, and human intervention is required before these programs systematically accuse uploaders of copyright infringement.15 compositions in the public domain are not subject to copyright; however, recorded performances of compositions from the public domain can be copyrighted. individuals may upload or livestream their own performances of classical music without fear of infringing copyright but may not upload another musician’s copyrighted recordings of the same pieces. for example, no one owns the copyright to bach’s cello suites and, therefore, anyone can profit from performing these works. sony music, though, owns the copyright to yo-yo ma’s recordings of bach’s cello suites, and anyone uploading these specific recordings to social media would be infringing copyright and subject to the repercussions. unfortunately, automated copyright enforcement systems often misidentify an individual’s performances as copyrighted recordings. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 3 the impact on classical musicians classical musicians are accustomed to having their content misidentified for infringing copyright, but with the pandemic forcing many more musicians to share performances regularly on social media, the problem has become ever more pervasive. adrian spence, the artistic director for chamber ensemble camerata pacifica, found himself appealing multiple copyright claims from both facebook and youtube. on occasion, he would dispute several claims issued by different copyright owners for the same recording. until these issues were resolved, facebook suspended camerata pacifica’s ability to livestream, and youtube displayed a notification on their channel informing viewers that their videos were likely to be removed due to anticipated copyright infringement.16 owen espinosa, a high school senior, was preparing for a piano recital, and during rehearsal, facebook ended his livestream over claims of copyright infringement. he was unable to successfully appeal the claim which meant that facebook would not host his performance. instead, he had to broadcast his recital on an acquaintance’s youtube channel.17 michael sheppard, a professional pianist, has had broadcasts interrupted and videos removed by facebook multiple times with notifications stating that music owned by naxos of america was detected in his performances.18 after facebook rejected his disputes, sheppard took to twitter, alerting naxos of his situation. his videos were eventually restored, but nothing could be done about his livestreams.19 the violinist.com broadcasts weekly, hour-long concerts featuring multiple guest musicians. during one of these performances, facebook muted child violinist yugo maeda due to a claim of copyright infringement. after appealing the notice, facebook unmuted maeda’s performance three days later.20 while covid-19 exacerbated the issue, classical musicians often had their performances interrupted or removed from social media. in 2019, conducting students at the university of british colombia had their facebook live feed interrupted over copyright infringement claims and, in 2018, facebook removed a recording of an in-home performance given by pianist james rhodes also stating that the music infringed copyright.21 also in 2018, the australia broadcasting corporation’s abc classic fm livestreamed a performance of beethoven’s symphony no. 9. the broadcast ended with facebook issuing a claim stating that the music in question was owned by two different copyright owners.22 in 2016, violinist claudia schaer disputed several of youtube’s copyright claims. she typically had success with these appeals, but one of her recordings received three claims from different copyright owners. she was able to refute two of them; however, the third remained, and she was warned that if she was unsuccessful in her second attempt at appealing the claim, her account would receive a copyright strike, deleting her video from the site permanently. she felt both intimidated and aggravated by the ordeal.23 the author of this article has also had to refute a copyright infringement claim on youtube. according to the notice, 51 seconds of the author’s approximately five-minute performance of beethoven’s “für elise” infringed copyright. as a result, the claimant authorized youtube to include ads in the video, allowing them to generate revenue. the dispute was upheld after the claimant’s 30-day window for a response expired. although the author does not rely on monetized videos and livestreams for income, it is unethical for another entity to profit from the work of an unaffiliated individual. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 4 disputing a copyright claim while there is recourse for uploaders facing copyright claims from social media sites, the appeals process can be lengthy and overwhelming. it can take more than two months for youtube to render a verdict when a musician disputes a copyright notice. during this span of time, classical musicians depending on ad revenue cease to generate income as these funds are held by the company until a final decision is made, at which point all profits accumulated by the video are released to the appropriate party. if the claim is upheld, the recording may remain online with proceeds going to the supposed copyright owner.24 uploaders may attempt to refute the result, but a failed appeal leads to the video’s removal and a copyright strike levied against the uploader s preventing them from livestreaming and monetizing videos for three months. should this occur, a counter notification can be issued which insists that the content in question has been mischaracterized as infringing and requires that would-be copyright owners file a lawsuit to uphold the claim. after three strikes, accounts are permanently deleted along with all associated uploads.25 the time that elapses for a final verdict along with the suspension of uploading and livestreaming permissions due to a copyright strike amounts to more than five months without being able to sustain an income. when a single performance is charged with multiple claims from different entities, as in the aforementioned examples, the uploader must dispute each one individually. this makes it easy to accumulate copyright strikes, risking account termination. it would be reasonable to assume that many classical musicians who endure these circumstances avoid the dispute process for fear of youtube removing their recordings, enforcing limitations on their ability to broadcast and monetize videos, and even permanently deleting their accounts. meanwhile, mistakenly recognized copyright owners can leverage this by appropriating the earnings generated by the work of unaffiliated musicians. furthermore, should the matter be redirected to the courts, the uploader faces the burden of retaining legal counsel. youtube algorithms deal with approximately 98% of all copyright issues and, because youtube’s business model generates profits primarily via user-uploaded content, it has been found to show bias towards established copyright owners.26 copyright owners can set preferences for how they want the system to react to instances of copyright infringement, resulting in the automatic monetization of 95% of claims for the copyright owner. as a result, user uploads make up 50% of the revenue generated by youtube for the music industry.27 although google reported in 2018 that 60% of disputed claims were found in favor of accused uploaders, the system clearly benefits established copyright owners.28 all of the aforementioned musicians who were accused of copyright infringement had their livestreams interrupted, saw their videos removed, and witnessed companies profiting from their work performing music that has long since passed into the public domain. youtube’s video series copyright and content id on youtube attempts to educate users on how automated copyright enforcement and the dispute process work, and while fair use and copyright permissions are discussed, the public domain is never mentioned; although, youtube does offer a brief explanation of the public domain on its help site.29 according to the us copyright act, the duration of copyright extends to 70 years after the death of the known composer, and for uncredited compositions or those composed by a musician under a pseudonym, copyright is recognized for 95 years from the date the work was published or 120 years from when it was composed, depending on which information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 5 expires first.30 while record labels are fully within their right to protect the recordings they own, that should have no bearing on individual musicians performing pre-twentieth-century music. the majority of online music consumption occurs on social media sites with 47% of the market share going to youtube.31 reports from deezer showed a near 20% increase in users listening to classical music since the start of the pandemic.32 given that more users are gravitating towards listening to classical music, and that the most popular digital access point for music is youtube, classical musicians coping with pandemic-induced restrictions were presented with what should have proven to be a lucrative opportunity. adhering to social distancing requirements and stay-athome mandates meant musicians cancelled their performances, leading to an exploration of virtual alternatives such as uploading recordings and livestreaming. obstructing these activities interrupts their sole source of income. conclusion while researchers have suggested a handful of improvements for automated copyright enforcement systems, they have not addressed the role that libraries can play in assis ting classical musicians.33 the tampa-hillsborough county public library, prior to the spread of covid-19, maintained four branches outfitted with recording studios; today, that number has grown to five. prior to pandemic library closures, recording studios were reserved just over 800 times, amounting to about 1,600 hours of usage between january 1, 2019 and march 13, 2020. patrons using the recording studios produce music and videos with the intention of uploading them to social media. other libraries with recording studios likely see their patrons doing the same, but without knowledge of copyright. libraries have the means and the motive to assist classical musicians. libraries can hold classes covering the basics of copyright, fair use, and the public domain, or that expand upon how automated copyright enforcement systems work on social media. library staff, however, may feel overwhelmed by the numerous texts on these subjects and may not know where to begin. an excellent starting point is the frequently asked questions page on the us copyright office website. this webpage offers explanations for a broad array of copyright-related issues and questions.34 fair use allows for unauthorized borrowing from a creative work; however, navigating how fair use is determined is always challenging. steven m. davis’ “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners” is a reliable point of reference for defining fair use, its application in copyright infringement cases, and ethical and legal implications regarding the limitations of algorithmic moderation systems.35 for a thorough look into the mechanics and applications of automated copyright enforcement, refer to the previously mentioned “are youtube and facebook cancelling classical musicians?” this article offers a synopsis on the shift from physical to digital media, descriptions of different algorithmic models developed specifically for copyright enforcement, and an account of how youtube’s and facebook’s copyright enforcement systems came to be.36 libraries can also offer help sessions that support patrons through the copyright claims dispute process. the youtube dispute interface is user friendly, and the instructions are comprehensible. throughout each step, explanations are offered to clarify what is being required of the user. for example, when asked for the reasoning behind the dispute, the user is offered four options: the disputed material is original content, the user has acquired permission to reproduce the co ntent, the content falls under fair use, or the content originates from the public domain. once selected, information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 6 additional explanations for each option are given in order to provide further clarification and context which allows the user to reconsider their choice and also helps the user better explain how their content falls under the selected category. finally, the user is asked to provide a narrative explaining how the content in question does not infringe copyright. facebook’s counternotification process is less generous, providing brief, ineffectual descriptions of copyright and a simple form requesting the user’s personal information and explanation for why the copyright infringement claim is unfounded. after library staff demonstrate the use of these interfaces, patrons can be guided to library resources to help them articulate and refine their arguments. for anything that cannot be found among the library’s collections, library staff may need to assist with internet searches, or patrons may request materials through interlibrary loan. additionally, patrons may still feel overwhelmed by the terminology being presented, which would further support the need for library programming that covers copyright-related topics. when considering the research involved to produce a convincing counterargument, information literacy and metaliteracy classes may be warranted. libraries can also encourage patrons to include descriptions in their uploads and livestreams with links to supporting evidence explaining that the featured music belongs to the public domain, and as the uploader, they own the rights to recordings and broadcasts of their own performances. the public domain description on youtube’s help page provides links to columbia university libraries’ copyright advisory service and cornell university’s copyright information center, and it suggests that these resources can lead to supporting evidence regarding works in the public domain.37 another excellent resource is the international music score library project’s petrucci music library. this database of almost 200,000 compositions belonging to the public domain features both sheet music and recordings of each of these works.38 users can also point to the public domain song anthology, a book comprising 348 popular songs from the public domain; the entire text can be downloaded from the publisher’s website.39 these resources and explanations can be included in disputes to support the reasoning for why a copyright claim is invalid. it should be noted that library employees are most often not lawyers, and as such, it is ill-advised to answer direct questions about the specific legality of the myriad of situations musicians face when disputing copyright claims. these matters require expert, specialist knowledge with which library staff are not equipped. the role of the library should only be to provide access to resources and inform the public on various issues regarding the use of information. as information specialists, librarians are in a unique position to educate patrons on information policy, and in this case, copyright. library systems with law libraries or with access to law collections and databases would be especially suited to teach patrons about copyright, guide them through the dispute process, and assist them with gathering resources to support their counterarguments. the tampahillsborough county public library and other systems like it that are outfitted with both music recording studios and a law library are encouraged to offer such services. hopefully, this overview of automated copyright enforcement, its impacts on classical musicians, and the suggestions to libraries offered here will promote further conversation that eventually leads to action and a possible solution. perhaps, as progress is made, automated copyright enforcement systems will grow more hospitable towards user-generated recordings and livestreams of classical music. after all, social media should be able to freely host the artistic talents of all musicians. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 7 endnotes 1 “covid-19 network update,” comcast, may 20, 2020, https://corporate.comcast.com/covid19/network/may-20-2020. 2 julia alexander, “the entire world is streaming more than ever—and it’s straining the internet,” the verge, march 27, 2020, https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-nowyoutube-twitch-amazon-prime-video-coronavirus-broadband-network; ella koeze and nathaniel popper, “the virus changed the way we internet,” the new york times, april 7, 2020, https://www.nytimes.com/interactive/2020/04/07/technology/ coronavirus-internet-use.html. 3 michael andor brodeur, “copyright bots and classical musicians are fighting online. the bots are winning,” the washington post, may 21, 2020, https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classicalmusicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd28fb313d1886_story.html. 4 adam eric berkowitz, “are youtube and facebook cancelling classical musicians? the harmful effects of automated copyright enforcement on social media platforms,” notes 78, no. 2 (december 2021): 177–202. 5 rebecca tushnet, “all of this has happened before and all of this will happen again: innovation in copyright licensing,” berkeley technology law journal 29, no. 3 (december 2014): 1147–87. 6 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 7 xia-mu niu and yu-hua jiao, “an overview of perceptual hashing,” acta electronica sinica 36, no. 7 (2008): 1405–11. 8 robert gorwa, reuben binns, and christian katzenbach, “algorithmic content moderation: technical and political challenges in the automation of platform governance,” big data & society 7, no. 1 (january 2020): 7. 9 larry hardesty, “explained: neural networks,” mit news, april 14, 2017, https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414. 10 daniel graupe, principles of artificial neural networks, 3rd ed. (hackensack, nj: world scientific publishing company, 2013), 1–3. 11 gorwa, binns, and katzenbach, “algorithmic content moderation,” 7. 12 daniel (yue) zhang, jose badilla, herman tong, and dong wang, “an end-to-end scalable copyright detection system for online video sharing platforms,” in proceedings of the 2018 https://corporate.comcast.com/covid-19/network/may-20-2020 https://corporate.comcast.com/covid-19/network/may-20-2020 https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414 information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 8 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 626–27. 13 daniel (yue) zhang et al., “crowdsourcing-based copyright infringement detection in live video streams,” in proceedings of the 2018 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 367. 14 berkowitz, “are youtube and facebook cancelling classical musicians?,” 200. 15 diego cerna aragon, “behind the screen: content moderation in the shadows of social media,” critical studies in media communication 37, no. 5 (october 19, 2020): 512–14. 16 brodeur, “copyright bots and classical musicians are fighting online.” 17 amy williams, “camerata pacifica to stream high school graduate’s senior recital,” classical candor: classical music news and reviews (blog), june 6, 2020, https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-62020.html. 18 baltimore school for the arts, “sometimes you have to fight!,” facebook, may 22, 2020, https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fightour-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/. 19 michael sheppard (@pianistcomposer), “dear @naxosrecords please stop muting portions of works whose composers have been dead for hundreds of years.” twitter, may 9, 2020, https://twitter.com/pianistcomposer/status/1259118489622777856. 20 laurie niles, “facebook and naxos censor music student playing bach,” violinist.com (blog), july 13, 2020, https://www.violinist.com/blog/laurie/20207/28375/. 21 brodeur, “copyright bots and classical musicians are fighting online”; ian morris, “facebook blocks musician from uploading his own performance—but did he break copyright?” daily mirror, september 7, 2018, https://www.mirror.co.uk/tech/facebook-blocks-musicianuploading-performance-13208194. 22 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 23 claudia schaer, “youtube copyright issues,” violinist.com (blog), february 15, 2016, https://www.violinist.com/discussion/archive/27589/. 24 “monetization during content id disputes,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,fili ng-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-idappeal,more-info-about-the-content-id-appeal-process. https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://twitter.com/pianistcomposer/status/1259118489622777856 https://www.violinist.com/blog/laurie/20207/28375/ https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.violinist.com/discussion/archive/27589/ https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 9 25 “copyright strike basics,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-geta-copyright-strike,resolve-a-copyright-strike. 26 google, how google fights piracy (november 2018), 14, https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf; joanne e. gray and nicolas p. suzor, “playing with machines: using machine learning to understand automated copyright enforcement at scale,” big data & society 7, no. 1 (april 2020): 1–15. 27 karl borgsmiller, “youtube vs. the music industry: are online service providers doing enough to prevent piracy?” southern illinois university law journal 43, no. 3 (spring 2019): 660. 28 google, how google fights piracy, 28–31. 29 youtube creators, copyright and content id on youtube, october 12, 2020, accessed december 11, 2021, https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd; “frequently asked copyright questions,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-publicdomain. 30 “how long does copyright protection last?” copyright.gov, us copyright office, https://www.copyright.gov/faq/faq-duration.html. 31 adam j. reis and manon l. burns, “who owns that tune? issues faced by music creators in today’s content-based industry,” landslide 12, no. 3 (january & february 2020): 13–16. 32 maddy shaw roberts, “research shows huge surge in millennials and gen zers streaming classical music,” classic fm, august 19, 2020, https://www.classicfm.com/music-news/surgemillennial-gen-z-streaming-classical-music/. 33 berkowitz, “are youtube and facebook cancelling classical musicians?,” 199–201. 34 “frequently asked questions” copyright.gov, us copyright office, https://www.copyright.gov/help/faq. 35 steven m. davis, “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners,” roger williams university law review 23, no. 1 (winter 2018): 1– 24. 36 berkowitz, “are youtube and facebook cancelling classical musicians?,” 177–202. 37 “frequently asked copyright questions,” youtube help. 38 “main page,” (website), imslp: petrucci music library, accessed december 12, 2021, https://imslp.org/wiki/main_page. 39 david berger and chuck israels, the public domain song anthology: with modern and traditional harmonization (charlottesville: aperio, 2020), https://aperio.press/site/books/m/10.32881/book2/. https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://www.copyright.gov/faq/faq-duration.html https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://imslp.org/wiki/main_page https://aperio.press/site/books/m/10.32881/book2/ abstract introduction automated copyright enforcement the impact on classical musicians disputing a copyright claim conclusion endnotes off-campus access to licensed online resources through shibboleth article off-campus access to licensed online resources through shibboleth francis jayakanth, ananda t. byrappa, and raja visvanathan information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12589 abstract institutions of advanced education and research, through their libraries, invest substantially in licensed online resources. only authorized users of an institution are entitled to access licensed online resources. seamless on-campus access to licensed resources happens mostly through internet protocol (ip) address authentication. increasingly, licensed online resources are accessed by authorized users from off-campus locations as well. libraries will, therefore, need to ensure seamless off-campus access to authorized users. libraries have been using various technologies, including proxy server or virtual private network (vpn) server or single sign-on, to facilitate seamless offcampus access to licensed resources. in this paper, authors share their experience in setting up a shibboleth-based single sign-on (sso) access management system at the jrd tata memorial library, indian institute of science, to enable authorized users of the institute to seamlessly access licensed online resources from off-campus locations. introduction the internet has both necessitated and offered options for libraries to enable remote access to an organization’s licensed online content—journals, e-books, technical standards, bibliographical and full-text databases, and more. in the absence of such an option for remote access, faculty, students, and researchers have limited and constrained access to the licensed online content from off campus locations. as scholarly resources transitioned from print to online in the mid-1990s, libraries and their vendors had to start identifying user affiliations in order to grant access to licensed online resources to the authorized users of an institution. the ip address was an obvious mechanism to do that. allowing or denying access to online resources based on a user’s ip address was simple, it worked, and, in the absence of practical alternatives, it became the universal means of authentication for gaining access to licensed library content.1 to facilitate seamless access to licensed online resources from off-campus sites, libraries have been using various technologies including proxy server or vpn server or remote desktop gateway or federated identity management or a combination of the said technologies. in our institute, the on-campus ip-based access to the licensed content is supplemented by vpn technology for off-campus access. the covid-19 pandemic has necessitated academic and scientific staff work from home, which demands smooth and seamless access to the organization’s licensed content. the sudden surge in demand for seamless off-campus access to the licensed online resources had an impact on the institute’s vpn server. also, not all authorized users of the francis jayakanth (francis@iisc.ac.in) is scientific officer, j.r.d. tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, j.r.d. tata memorial library, indian institute of science. raja visvanathan (raja@inflibnet.ac.in) is scientist c (computer science), inflibnet centre, gandhinagar, india. © 2021. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:raja@inflibnet.ac.in information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 2 institute are entitled to get vpn access. to mitigate the situation, the library, therefore, had to explore a secure, reliable, and cost-effective solution to facilitate seamless off-campus access to all the licensed online resources to all the authorized users of the institute. after exploring the possibilities, the library decided to implement a single sign-on solution based on shibboleth. shibboleth software implements the security assertion markup language (saml) protocol, separating the functions of authentication (undertaken by the library or university, which knows its community of end users) and authorization (undertaken by the resource provider, which knows which libraries have licenses for their users to access the resource in question). 2 about the indian institute of science (iisc) the indian institute of science (iisc, or “the institute”) was established in 1909 by a visionary partnership between the industrialist jamsetji nusserwanji tata, the maharaja of mysore, and the government of india. over the 109 years since its establishment, iisc has become the premier institute for advanced scientific and technological research and education in india. since its inception, the institute has laid a balanced emphasis on the pursuit of fundamental knowledge in science and engineering, and the application of its research findings for industrial and social benefit. during 2017–18, the institute initiated the practice of undergoing international peer academic reviews over a 5-year cycle. each year, a small team of invited international experts reviews a set of departments. the experts spend 3 to 4 days at the institute. during this period, they interact closely with the faculty and students of these departments and tour the facilities, aiming to assess the academic work against international benchmarks. iisc has topped the ministry of human resource development (mhrd), government of india’s nirf (national institutional ranking framework) rankings not only in the university’s category but also overall among all ranked institutions. times higher education has placed iisc at the 8th position in its small university rankings (that is, among universities with fewer than 5 ,000 students), at the 13th position in its ranking of universities in the emerging economies, and in the range 91–100 in its world reputation rankings. in the qs world university rankings, iisc is ranked 170. in the same ranking system, on the metric of citations per faculty, iisc is placed in second position. iisc publishes about 3,000 papers per year in scopus and web of science indexed journals and conferences and, each year, the institute awards around 400 phd degrees. about the iisc library jrd tata memorial library (https://www.library.iisc.ac.in), popularly known as the indian institute of science library, is one of the best science and technology libraries in india. started in 1911, as one of the first three departments in the institute, it has become a precious national resource center in the field of science and technology. the library receives annually a grant of 1012% of the total budget of the institute. the library spends about 95% of its budget toward periodical subscriptions, which is unparalleled in this part of the globe. with a collection of nearly 500,000 volumes of books, periodicals, technical reports and standards, the jrd tata memorial library is one of the finest in the country. currently, it subscribes to over 13,000 current periodicals. the library also maintains the iisc’s research publications repository, eprints@iisc (http://eprints.iisc.ac.in), and its theses and dissertations repository, etd@iisc (https://etd.iisc.ac.in). https://www.library.iisc.ac.in/ http://eprints.iisc.ac.in/ http://etd.iisc.ac.in/ https://etd.iisc.ac.in/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 3 off-campus access to licensed online resources in a typical research library, licensed scholarly resources comprise research databases, electronic journals, e-books, standards, and more. a library licenses these resources through publishers/vendors. these license agreements limit access to the resources to the authorized users of an institute. in our case, authorized users include faculty members, enrolled students, current staff, contractual staff, and walk-in users to the library. seamless access to the licensed resources from on-campus sites is predominantly ip-address authenticated, which is a simple and efficient model for users physically located on the institute campus. these users expect a similar experience while accessing licensed online resources from off-campus locations. therefore, the challenge to the libraries is to ensure that such off-campus accesses are secure, seamless, and restricted to authorized users of an institute. libraries have been using various technologies including proxy servers, vpn servers, or single sign-on to facilitate seamless off-campus access to licensed resources. our institute has been using vpn technology to enable off-campus access to licensed online resources. a virtual private network (vpn) is a service offered by many organizations to its members to enable them to remotely connect to the organization’s private network. a vpn extends a private network across a public network and allows users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. applications running across a vpn may therefore benefit from the functionality, security, and management of the private network. encryption is common, although not an inherent, part of a vpn connection.3 in our institute, faculty members and students are provided access to the vpn service when their institute email address is created. users follow four steps to use a vpn client to get connected to the campus network: • install vpn client software on their computer system. cisco anyconnect (https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-securemobility-client/at-a-glance-c45-578609.html) is one such software. • start the vpn client software every time there is a need to connect to the private network. • enter the address of the institute’s vpn server, and click connect in the anyconnect window. • log in to the vpn server using their institutional email credentials. an authorized user of the institute can use any of the ip authenticated network services, including the licensed online resources, after a successful login to the vpn server. the vpn technology has been serving the purpose well, but the service is, by default, available only to the institute’s faculty and students. other categories of employees such as project assistants, project associates, research assistants, post-doctoral fellows, and others, who constitute a good percentage of iisc staff, are provided vpn access on a case-by-case basis. during the covid-19 lockdown, the library received several enquiries about accessing the online resources from off-campus sites. realizing the importance of the situation, the library quickly assessed the various possibilities for facilitating seamless off-campus access to the subscribed online resources apart from the vpnbased access. federated access through shibboleth identity provider (idp) service emerged as a possible solution to facilitate seamless off-campus access to the entire institute community. https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 4 federated access federated access is a model for access control in which authentication and authorization are separated and handled by different parties. if a user wishes to access a resource controlled by a service provider (sp), the user logs in via an identity provider (idp). more complex forms of federated access involve the use of attributes (information about the user passed from the idp to the sp, which can be used to make access decisions) and can include extra services such as trust federations and discovery services (where the user selects which idp to use to connect to the sp). 4 examples of this federated access model include shibboleth and openathens. shibboleth is opensource software that offers single sign-on infrastructure. openathens is a commercial product delivered as a cloud-based solution. it supports many of the same standards as shibboleth. so, an institution could pay and join the openathens federation, which will provide technical support to set up, integrate, and operationalize federated access using openathens. we decided to go with shibboleth for the following reasons: • to avoid the recurring cost associated with the openathens solution. • the existence of a shibboleth-based infed federation in the country. infed manages the trust between the participating institutions and publishers (http://infed.inflibnet.ac.in/). • infed is part of the edugain inter-federation, which enables our users to gain access to the resources of federations of other countries. what is shibboleth? shibboleth is a standards-based, open-source software package for web single sign-on across or within organizational boundaries. it allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. the shibboleth software implements widely used federated identity standards, principally the oasis security assertion markup language (saml), to provide a federated single sign-on and attribute exchange framework. a user authenticates with their organizational credentials, and the organization (or identity provider) passes the minimal identity information necessary to the service provider to enable an authorization decision. shibboleth also provides extended privacy functionality allowing a user and their home site to control the attributes released to each application (https://www.shibboleth.net/index/). shibboleth has two major components: (1) an identity provider (idp), and (2) a service provider (sp). the idp supplies required authorizations and attributes about the users to the service providers (for example, publishers). the service providers make use of the information about the users sent by the idp to make decisions on whether to allow or deny access to their resources. interaction between a shibboleth identity provider and service provider. when a user attempts to access licensed content on the service provider’s platform, the service provider generates an authentication request and then directs the request and the user to the user’s idp server. the idp prompts for the login credentials. in our setup, the idp server communicates the login credentials to the institute’s active directory (ad) using the secure lightweight directory access protocol (ldap). http://infed.inflibnet.ac.in/ https://www.shibboleth.net/index/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 5 ad is a directory service provided by microsoft. in a directory service, objects (such as a user, a group, a computer, a printer, or a shared folder) are arranged in a hierarchical manner facilitating easy access to the objects. organizations primarily use ad to perform authentication and authorization. once the authenticity of a user is verified, ad helps in determining if a user is authorized to use a specific resource or service. access is granted to a user only if the user checks out on both counts. the ad authenticates a user, and the response is sent back to the idp along with the required attributes. the idp then releases only the required set of attributes to the service provider. based on the idp attributes, which is nothing but a user’s entitlement, the sp grants access to the resource. figure 1 illustrates the functioning of the two components of shibboleth. figure 1. a shibboleth workflow involving a user, identity provider, and service provider. identity federation the interaction between a service provider and identity provider happens based on mutual trust. the trust is established by providing idp metadata as encrypted keys and the idp url that the sp uses to send and request information from the idp. the exchange of metadata between idp and sp can be informal if an institution licenses online resources from only a few publishers. however, research libraries license content from hundreds of sps. therefore, the role of federations is significant. in the absence of a federation, each identity provider and service provider must individually communicate with each other about their existence and configuration, as illustrated in figure 2. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 6 figure 2. individual communication between idps and sps. a federation is merely a list of metadata entries aggregated from their member idps and their sps. our institute is a member of infed (information and library network access management federation). infed was established as a centralized agency to coordinate with member institutions in the process of implementing user authentication and access control mechanism across all member institutions. infed manages the trust relationship between the idps and sps (publishers) in india. therefore, individual idps that intend to facilitate access to subscribed online resources through shibboleth will share their metadata with infed. infed, in turn, will share the metadata of the idps with respective service providers, as illustrated in figure 3. other regions have their federations. for example, n the us, incommon (https://www.incommon.org/) serves as the federation, and in the uk, it is the uk access management federation (http://www.ukfederation.org.uk/). https://www.incommon.org/ http://www.ukfederation.org.uk/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 7 figure 3. role of a federation as a trust manager between idps and sps. how does one gain access to shibboleth-enabled resources? a federation manages the trust between identity providers and service providers. the sps enable shibboleth-based access to subscribed resources to the idps based on the metadata shared by a federation. once the sps allow access, users can access such resources by using the institutional login option via the athens/shibboleth link found on the service provider’s platform. alternatively, a library can create a simple html page listing all the shibboleth-enabled licensed resources, as shown in figure 4. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 8 figure 4. partial screenshot of shibboleth-enabled resources of our institute. each of the links in figure 4 is a wayfless url. a wayfless url is specific to an institution (idp), and it enables users of that institution to gain federated access to a service or resource in a way that bypasses where are you from (wayf), or the institutional login (discovery service) steps on the sp’s platform. since the institutional login or the discovery service step can be confusing to end users, wayfless links to the resources will facilitate an improved end-user experience in accessing licensed resources. a user needs to follow a link from the list of resources. the link will take the user to the sp. the sp will redirect the user to the idp server for authentication. after successful authentication, the user will gain access to the resource. there are two ways to get a wayfless url to a service: (1) the service provider can share the url or (2) one can make use of a wayfless url generator service like wugen (https://wugen.ukfederation.org.uk/wugen/login.xhtml). https://wugen.ukfederation.org.uk/wugen/login.xhtml information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 9 benefits of shibboleth-based access shibboleth-based single sign-on can effectively address several requirements of the libraries in ensuring secure and seamless on-campus and off-campus access to subscribed online resources. there are other benefits of shibboleth-based sso: 1. it is open-source software that provides single sign-on infrastructure. 2. it enables organizations to use their existing user authentication mechanism to facilitate seamless access to licensed online resources. 3. being a single sign-on system, for the end users, it eliminates the need to have individual credentials for each online resource. 4. it uses security assertion mark-up language (saml) to securely transfer information about the process of authentication and authorization. 5. it is used by most of the publishers, who facilitate shibboleth-based access through shibboleth federations. 6. it requires a formal federation as a trusted interface between the institutions as an identity provider (idp) and publishers as service providers (sp) thereby ensuring the use of uniform standards and protocols while transmitting attributes of authorised users to publishers. inflibnet’s access management federation, infed, plays this role (https://parichay.inflibnet.ac.in/objectives.php). idp server configuration we installed the shibboleth idp software version 3.3.2 on a virtual machine on the azure platform. the vm system is configured with two virtual cpus, 4 gb of ram, 300 gib of os disk (standard hdd), and ubuntu linux os version 18.04.4 lts. coordination with the organization’s network support team is essential. the network support team handles the domain name service resolution of the idp server and facilitates the idp server to communicate with the organization’s active directory and to open non-standard communication ports on the idp server. shibboleth idp usage statistics the infed team has developed a beta version of the usage analysis tool called infedstat to analyse the use of federated access to gain access to licensed resources. we have implemented the tool on the idp server. figure 5 shows the redacted screenshot of the infedstat dashboard. it shows • date-wise usage details of logged-in users along with ip address, time logged in, and the publishers’ platforms accessed, • number of times the publishers’ platforms were accessed during a specific period, • number of times users logged in for a specific period, • unique users for a specific period, and • unique publishers accessed during a specific period. https://parichay.inflibnet.ac.in/objectives.php information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 10 figure 5. idp usage dashboard. conclusions the implementation of federated access to subscribed online resources has ensured that all the authorized users of the institute can access almost all the licensed resources from wherever they are. the counter 5 usage analysis of subscribed resources for the period of january 2020 to october 2020 indicates that usage of online resources has increased by nearly 20 percent over the last year for the same period. the enhanced use could be partly because of ease of accessing online resources facilitated by federated access. to assess the reasons for enhanced usage of online resources, the library is planning to conduct a survey to understand how convenient and useful federated access to online resources has been especially while being off campus. federated access through single sign-on is useful not just for accessing licensed online resources. a typical research library offers various other services to its users, including the institutional repository service, learning management system, online catalogue, etc. the library intends to integrate such services with sso, thereby freeing the end users from service-specific credentials. endnotes 1 thomas dowling, “we have outgrown ip authentication,” journal of electronic resources librarianship 32, no. 1 (2020): 39–46, https://doi.org/10.1080/1941126x.2019.1709738. 2 john paschoud, “shibboleth and saml: at last, a viable global standard for resource access management,” new review of information networking 10, no. 2 (2004): 147–60, https://doi.org/10.1080/13614570500053874. 3 andrew g. mason, ed., cisco secure virtual private network (cisco press, 2001): 7, https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336. 4 masha garibyan, simon mcleish, and john paschoud, “current access management technologies,” in access and identity management for libraries: controlling access to online information (london, uk: facet publishing, 2014): 31–38. https://doi.org/10.1080/1941126x.2019.1709738 https://doi.org/10.1080/13614570500053874 https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336 abstract introduction about the indian institute of science (iisc) about the iisc library off-campus access to licensed online resources federated access what is shibboleth? interaction between a shibboleth identity provider and service provider. identity federation how does one gain access to shibboleth-enabled resources? benefits of shibboleth-based access idp server configuration shibboleth idp usage statistics conclusions endnotes 190 information technology and libraries | december 2011 from static and stale to dynamic and collaborative: the drupal difference editor’s note: this paper is adapted from a presentation given at the 2010 lita forum. i n 2009, the university library of the university of california, santa cruz, moved from a static, dreamweaverand html-created website to an entirely new databasedriven website using the open-source content management system (cms) drupal. this article will describe the interdisciplinary approach the project team took for this large-scale transition process, with a focus on user testing, information architecture planning, user analytics, data gathering, and change management. we examine new approaches implemented for group-authoring of resources and the challenges presented by collaboration and crowdsourcing in an academic environment. we also discuss the impact on librarians and staff changing to this new paradigm of website design and development and the training support provided. we present our process for testing, staging, and publishing new content and describe the modules used to build dynamic subjectand course-guide displays. finally, we provide a list of resources and modules for beginning and intermediate drupal users. why change was needed our old library website was created using static html and its organizational structure evolved to mirror the administrative structure of the library. the vocabulary we used was very library-centric and, though useful to library staff, could be confusing to patrons. like many larger, older websites, we had accumulated a number of redundant and defunct pages. many of these pages had not been updated for years, had inconsistent naming conventions, or outdated page design. the catalyst for updating our web presence was predicated on several things. with more than one million visits per year and more than two million page views, our old servers were no longer able to handle this load, and we were about to begin a major project to replace our server hardware. in addition, we anticipated participating in an upcoming transition to a new campuswide website template. we saw this moment of change as an opportunity to revitalize the library website’s entire structure and reorganize it with a more user-centric approach to the menus and vocabulary. to do this, we decided to move away from dreamweaver and the static html approach to web design and instead choose a cms that would provide a more flexible and innovative interface. choosing drupal we had done research on commercial and open-source solutions and were leaning toward drupal as our cms. many academic departments at our campus were going through a similar process of website redesign and had already explored the cms options and had chosen drupal. this helped move us toward choosing drupal and taking advantage of a growing developer community on campus. two of the largest units on campus both chose drupal as their cms and have since been great partners for collaboration and peer support. drupal is a free, open-source cms (or content management framework) written in php with a mysql database backing it up. it is a small application of core modules with thousands of add-on modules available to increase functionality. drupal also has a very strong developer community and has been adopted by a growing number of libraries. we have found it to be very open and fluid, which is both a blessing and curse. for any one problem there can be dozens of differing solutions and modules to resolve it. the transition team the library created a core website implementation team consisting of a librarian project manager/developer, a web designer from the it department, and two librarian developers. the core team was supported by a server administrator and an it analyst. the it staff supported the technical aspects of drupal installation, backup, and maintenance. the librarian developers planned the content migration and managed the user interface design, layout, content, scope, and architecture. they needed to know the basics of how drupal works and needed to have much more access to the inner workings of drupal (e.g., modules, user permissions, etc.) than staff. the librarians also would train library staff, so needed to be able to teach and develop documentation and tailor instruction to specific staff needs. everyone who participated in the implementation team had many other competing responsibilities. the librarian developers had other projects and traditional duties such as collection development and reference services, so learning drupal and creating this new website was a part-time project and had to be integrated into existing workloads. tutorial ann hubble, deborah a. murphy, and susan chesley perry ann hubble (ahubble@ucsc.edu) is science librarian, deborah a. murphy (damurphy@ucsc.edu) is emerging technologies librarian, and susan chesley perry (chesley@ucsc.edu) is head of digital initiatives, university of california, santa cruz. selecting a web content management system for an academic library website | hubble, murphy, and perry 191from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 191 and often eccentric organizational structures that were no longer meaningful. the previous website had accumulated pages that were a bit more freewheeling in design with a lack of consistent navigation and “look and feel.” adding another layer of complexity, our website changeover took place during a period of great organizational change and a severe budget crisis. surprisingly, what seemed at first a major drawback was actually somewhat helpful. with fewer people spread thinner and doing more work, there was less need to feel in control of individual empires, leading to more cooperation during the changeover. staff learning styles vary, and no one approach to drupal training will work for everyone, so we brought many of the lessons we have learned in our bibliographic instruction sessions to our staff training. for example, we focused training on repetition, reassurance, and patience, ensuring it was an active process with hands-on participation as well as a lecture or demonstration. we provided ample time for questions and invited staff to bring their own projects to work on during training sessions. though some staff only needed to learn a few applications within drupal to perform their jobs, most needed specialized instruction to do some departmental-specific task or action that now had a very different interface. we supplemented our large group by drop-in training sessions with specialized departmental sessions, custom-made documentation, individual hands-on training, e-mail updates on system changes, and regular presentations of new system features. not everyone will become a “born again” drupalista, but everyone should at least feel that they can get their work done using drupal. drupal has also meant changes not only in the way content is added to the website, but also in how we handle revisions and updates. in the past, we had a very siloed initially from the increasing interest in drupal at the campus level. they attended a two-day intensive drupal training course from a company called lullabot, which provided an in-depth technical foundation for our initial drupal installation. this level of technical training and content was not appropriate for the other librarian developers on our team. however, a more detailed, midlevel training would have benefited the librarian developers and moved the project forward at a faster pace. these librarian developers learned using a combination of resources, including free online content that covers core drupal skills, combined with a few carefully chosen professional in-person consultations and online training packages and books. drupal is not a static environment, so after the initial training there was still a need for regular updates and refresitishers. our transition team joined the drupal4lib discussion list and consulted with library colleagues using drupal in the northern california area. drupalcon conferences as well as online users groups were excellent places not only to learn but also to make contact with vendors and other developers. several of these resources are listed in the accompanying bibliography in appendix b. staff training by far our largest group of library drupal users was the fifty-plus library staff content contributors who were faced with learning a new approach to web development. drupal’s successful implementation was ultimately dependent on ensuring library staff would be able to create, edit, and manage thousands of library webpages using this new cms. this was a change for everyone in the library, not just a few. the new website meant leaving behind the comfort of routines created over the years, elaborate designs that had been developed, and various idiosyncratic transition planning with the goal of making our new site user-centered, we wanted to make data-driven decisions about design rather than what had ultimately devolved into the practice of decisions based on politics and committee negotiations. to that end, we took several approaches to gathering user data. we inventoried our current site and gathered usage statistics based on website analytics. we met with small campus focus groups who answered questions about library site searching. we created personas for user categories based on profiles of average users (e.g. first-year students, graduate students, faculty, community users, etc.). based on this data, we drafted web interface wireframes and began user testing. drupal implementation also included developing a safe and effective means of moving from a testing environment to a final, public production site. this deployment process is a crucial component of ensuring that we could both test new features and still provide a stable environment for our users. after extensive discussions and revisions we developed a process to experiment with new modules and themes in a way that does not overwrite our existing public site. the deployment process goes from development to staging to production. it is critical to be able to determine that a new module or update will not negatively affect the database. the process we follow from our sandbox site to our production site is described in more detail in appendix a. transition team training we had three types of drupal users within the library: system administrators, developers, and staff (the primary content editors); each group had its own training needs. the library project manager, web designer, and it systems administrators benefited 192 information technology and libraries | december 2011 appropriate for this particular subject. each tab can also be customized to display whichever records pertain to this subject area. how our dynamic displays were built cck (content construction kit) module content used in both the “article databases and research tools” and “subject guides” displays is held within a special record type we created using the content construction kit (cck) module. we called this special record, or content type, online resource. we defined fields within the online resource record to hold information about individual resources we want to either display on our website or keep track of internally. the fields we defined include the resource name, url (sometimes multiple urls), description, type of information (article database, dictionary, encyclopedia, etc.), and subject discipline. figure 3 shows what a portion of the online resource record for a particular database changes, it’s updated in a central record and immediately reflected in displays throughout the site. not only is it less work to update information, but we also can provide resources in more varied combinations and make them more findable for our users. figure 1 shows how the dynamically created “article databases and research tools” list appears to a user browsing our website. the default display lists these resources in alphabetical order. the user can display the same group of records sorted by other criteria just by clicking on the appropriate tab. if the “by subject” tab is selected, the resources are displayed under subject headings. selecting the “by type” tab lists the resources by resource types, such as dictionaries and encyclopedias, citation style guides, etc. our subject guides are also created using the same components used to build the “article databases and research tools” lists. figure 2 shows a portion of one of our subject guides. like the previous example, this portion of the guide is created dynamically, displaying only records permissions environment that limited editing to only those given specific permissions. we now have role-based ownership where everyone can edit everything so that we did not have to keep up a detailed list of who does what. initial concern that someone could write over or accidently delete pages was somewhat remedied by the drupal revisions history feature, which assists with version control. there have been a few pages where ownership is an issue, and we are still in the process of developing a system to ensure that pages are updated when there is no specific individual linked to a page. dynamic displays: article databases and subject guides as part of the move to drupal, we wanted to take advantage of the new environment to redesign some of the more specialized portions of our site. in particular, we hoped that drupal’s dynamic displays would help us to keep our site more current with less effort. with this in mind, we chose to focus on two of the most heavily used resources: our list of article databases and our library subject guides. we planned to transform these static, high-maintenance html pages into dynamic, easily maintained, and easily generated resources. we used a number of drupal modules to develop the library’s website, and these are described in more detail in appendix c. to redesign our list of article databases and our library subject guides, we relied heavily on three important modules: cck (content construction kit), views, and taxonomy. the interaction of these three modules is key to building dynamically created webpages. once these modules are configured, information is input just once. drupal does the work of pulling the right information from each resource to create dynamic displays. if information, such as a url figure 1. dynamic display: article databases and research tools selecting a web content management system for an academic library website | hubble, murphy, and perry 193from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 193 and not programmers. we found that drupal was very different from anything we had used before and had a very steep learning curve. if we could start over, we would have invested much more time in lessons learned learning drupal takes time our implementation team was comprised predominantly of librarians and list of defined fields looks like behind the scenes to the librarian web developer. some fields within the online resource record rely on a little further customization. the “type of information” field is defined via an allowed values list. figure 4 shows a portion of the values list we have defined for this particular field. the “subject, discipline, topic” field (figure 3) incorporates a taxonomy list that we first created using the taxonomy module. this taxonomy vocabulary allows us to later sort the resources dynamically in both the “article databases and research tools” (figure 1) and “subject guides” displays (figure 2). taxonomy module figure 5 shows the list of subject terms we created using the taxonomy module. terms are easily added, edited, and organized via this taxonomy display, available only to the web developers. views module–putting it all together to define how the online resource records are displayed to the user (figure 1), we use the views module. views allow us to define, sort, and filter these records for display. figure 6 shows what the “article databases and research tools” view of figure 1 looks like to the web developer. notice that “a–z,” “subjects,” and “by type” are listed in a box on the left side of the page. each of these tabs corresponds to a tab on the page that displays to the user. in this case, “a–z” is bold and is the active tab currently being defined for this display. display settings such as the record type used, number of records to display per page, specific fields to display, type of sorting, and url path for the webpage are defined here. figure 2. dynamic display: dynamic portion of a subject guide figure 3. cck module: online resource record: manage fields 194 information technology and libraries | december 2011 learning drupal basics and getting a better grasp of how drupal works as a cms. the architecture and database-driven paradigm of our new drupal site is a significantly different environment from our previous website’s html-designed pages and directory-and-folder organization. of particular importance for our site were three core modules: cck, views, and taxonomy. becoming proficient with these modules was a challenge, and we can’t emphasize enough the importance of good, basic training on their use. start small: identify small parts to bring over initially, the thought of moving our old website to drupal seemed insurmountable. bringing over static html pages was straightforward, but portions of the website (such as converting our database of online resources) took more intensive planning. the entire process became more manageable when we divided up the site and focused on drupalizing small parts at a time. this way we could focus on learning enough drupal to make these portions of the site work without being overwhelmed. project management software: document & share what you’ve done if we were to transition an entire website again we would recommend using some type of project management software before starting. none of the implementation team worked on this site full time. this project was added to our other full-time workload providing reference services, collection planning, teaching, digital projects, etc. during our project we tried several free products but were not satisfied with any of them. we felt that finding the right project management package could have made the website transition process much figure 4. cck module: allowed values list (type of information) figure 5. taxonomy module: subjects list selecting a web content management system for an academic library website | hubble, murphy, and perry 195from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 195 and the library is now in a much better position for future website design transitions, a process that will be much easier with so much less static content to migrate. for example, the look and feel of our entire website can be transformed by reconfiguring a few options within drupal. ultimately, the transition of the library website to a drupal environment was a very good thing, and we are glad we did it. it was difficult and messy at times, but our website is now more flexible, agile, adaptable, and better poised for change. epilogue since this article was submitted, the uc santa cruz university library website has moved to an entirely new campus theme. we note that having a drupal-based cms greatly aided this transition process. personas for librarians and content contributors and done more usability testing for non-developers. we found that training and teaching library staff the architecture and databasedriven paradigm of the new drupal culture has been a challenge and we still have varying levels of buy-in. conclusion we now have a consistent look and feel to our site, though there are still many things yet to do. now that we are more comfortable using drupal, we can focus on creating more dynamic content, such as staff lists, adding sidebars to pages, and so on. increasing the number of dynamically created pages will mean a more up-to-date site in general. though group authoring within the library is still a challenge, we continue to find ways to encourage collaboration. easier. documenting and sharing how we created elements of the site helped us replicate complex components and allowed us to collaborate more easily on various projects. test, test, test testing the website as we developed it was a crucial component of our work. modules also can interact with other modules in unpredictable ways, so we ultimately found that loading new modules on our sandbox site, a mirror of the library website, was a crucial step in determining compatibility as well as functionality with our existing site (appendix a). it’s essential to practice using a live site without bringing the real production website down. focus on essential modules: cck, views, taxonomy images, wysiwyg editors drupal comes with a set of core modules plus an ever-increasing number of specialized contributed modules. finding and installing the right contributed module that fits a particular need can sometimes be difficult. there are often myriad modules that can solve a problem. it takes time to find and test each one to see if it will actually function as needed, and not all modules work well with one another. focusing on the essential drupal core modules plus cck, views, and taxonomy will help reduce unnecessary development frustrations. staff are important though we created many personas for faculty, students, and community users, we should have created figure 6. views module: article databases and research tools view 196 information technology and libraries | december 2011 appendix a. website deployment process created by bryn kanar and sue chesley perry selecting a web content management system for an academic library website | hubble, murphy, and perry 197from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 197 appendix b. drupal resources for getting started ■■ american library association. “drupal4lib interest group (lita library & information technology association).” http://connect.ala.org/node/71787 (accessed march 18, 2011). ■■ american library association. “showcase: database pages & research guides using drupal.” http://connect.ala .org/node/98546 (accessed march 18, 2011). ■■ austin, andy, and christopher harris. “drupal in libraries.” library technology reports 44, no. 4 (2008). ■■ byron, angela, addison berry, nathan haug, jeff eaton, james walker, and jeff robbins. using drupal: choosing and configuring modules to build dynamic websites. sebastopol, ca: o'reilly, 2008. ■■ drupal. “drupal.org.” http://drupal.org/(accessed march 18, 2011). ■■ drupal dojo. “drupal dojo.” http://drupaldojo.com/ (accessed march 18, 2011). ■■ drupal modules. “search, rate, and review drupal modules.” http://drupalmodules.com/ (accessed march 18, 2011). ■■ “drupalconsf san francisco – april 19-21, 2010.” http://sf2010.drupal.org/conference/sessions (accessed march 18, 2011). ■■ drupalib.”drupalib: a place for library drupalers to hang out.” http://drupalib.interoperating.info/ (accessed march 18, 2011). ■■ gotdrupal.com. “gotdrupal: once you've got it, you're addicted!”. http://gotdrupal.com (accessed march 18, 2011). ■■ groups.drupal. “libraries.” http://groups.drupal.org/libraries (accessed march 18, 2011). ■■ groups.drupal. “list of libraries using drupal.” http://groups.drupal.org/libraries/libraries (accessed march 18, 2011). ■■ “is this site built with drupal?”. http://www.isthissitebuiltwithdrupal.com/ (accessed march 18, 2011). ■■ learn by the drop. “learn by the drop: a place to learn drupal.” http://learnbythedrop.com/ (accessed march 18, 2011). ■■ “lullabot.” http://lullabot.com (accessed march 18, 2011). ■■ mastering drupal. “drupal screencasts.” http://www.masteringdrupal.com/videos (accessed march 18, 2011). ■■ slideshare. “drupal resources for libraries, sarah houghton-jan.” http://www.slideshare.net/librarianinblack/ drupal-resources-2982935 (accessed march 18, 2011). ■■ slideshare. “introduction to drupal for libraries, laura solomon.” http://www.slideshare.net/oplin/intro-to -drupal-for-libraries (accessed march 18, 2011). ■■ sunrainproductions. “drupalcampla 2009 views demystified.” http://www.sunrainproductions.com/ drupalcampla/views-demystified (accessed march 18, 2011). appendix c. selected drupal modules used on the ucsc library site ■■ administration menu—adds a top menu bar for authenticated users with common administration tasks ■■ cck—allows you to add new content types, for example the online resources content type for a–z list ■■ ckeditor—wysiwyg editor ■■ google analytics—adds google javascript tracking code to all of our site's pages ■■ google cse—allows us to use google as the site search ■■ imce—image-uploading module, also allows you to create subdirectories within the image directory ■■ image cache—allows you to pre-set sizes for images ■■ ldap integration—links user authentication to the library’s ldap server ■■ mollum—spam filter and image captcha (part of spam control) ■■ nice menus—allows drop-down/right/left expandable menus ■■ nodeblock—allows you to specify a content type as being a block, which content creators to edit the block text and title without having to access the block administration page ■■ pathauto—automatically generates path aliases for various kinds of content (nodes, categories, users) ■■ printer-friendly, e-mail and pdf versions—allows you to configure any type of page to display links for print, e-mail, and pdf ■■ rules—allows site administrators to define conditionally executed actions based on occurring events, we use it to send email when new content is created and to hide some content fields from selected user roles ■■ taxonomy—enables us to assign subjects and other categories to content; the url paths and views use taxonomy ■■ webform—enables quick creation of forms and questionnaires academic uses of google earth and google maps in a library setting eva dodsworth and andrew nicholson academic uses of google earth and google maps in a library setting | dodsworth and nicholson 102 abstract over the last several years, google earth and google maps have been adopted by many academic institutions as academic research and mapping tools. the authors were interested in discovering how popular the google mapping products are in the academic library setting. a survey was conducted to establish the mapping products’ popularity, and type of use in an academic library setting. results show that over 90 percent of the respondents use google earth and google maps either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. the authors recommend expanding the mapping products’ user base to include all reference and liaison librarians. introduction since their launch in 2005, google maps and google earth have had an enormous impact on the way people think, learn, and work with geographic information. with easy access to spatial and cultural information, google maps/earth has provided users with the means to understand their world and their communities of interest. moreover, the customizable map features and dynamic presentation tools found in google maps and google earth make each one an attractive option for someone wanting to teach geographic information or make customized maps. for academic researchers, google mapping applications are also appealing for their powerful ability to share and host projects, create customized kml (keyhole markup language) files, and to easily communicate their own research findings in a geographic context. recognizing their potential for revitalizing map collections and geographic education, the authors felt that many academic libraries were also going to be active in using google maps/earth for a variety of purposes, from promoting their services to developing their own google kml files for users. with google earth’s ease of use and visualization capabilities, it was even thought that academic libraries would be using google earth heavily in instruction classes bringing geographic information to subject areas traditionally outside of geography. as active users of google maps/earth in their roles as academic librarians at their universities, the authors became curious to know what other academic librarians were doing with google maps/earth, particularly those working with maps and/or geography subjects. were they using eva dodsworth (edodsworth@uwaterloo.ca) is geospatial data services librarian, university of waterloo library, waterloo, and andrew nicholson (andrew.nicholson@utoronto.ca) is gis/data librarian, hazel mccallion academic learning centre, university of toronto mississauga, ontario mailto:edodsworth@uwaterloo.ca mailto:andrew.nicholson@utoronto.ca information technology and libraries | june 2012 103 the technology as part of their librarian roles on campus? how were they using it? what impacts was it having in how they delivered library services? to help answer these questions, the authors set out on a three-stage process with the aim of providing a more complete picture of google maps/earth use in academic libraries. the first stage consisted of a literature search focusing on library and information science research databases, to see what (if any) scholarly research had been written that discussed the role of google maps/earth in academic libraries. the second stage of the research had the authors examining over a dozen academic library websites to assess how they were integrating google maps/earth either through an api plug-in on their website or advertising other google maps/earth related services and collections. the third stage had the authors compile a set of twenty survey questions which were then distributed to academic librarians across canada and the united states, probing the use of google mapping products in the academic library setting. literature review despite the ubiquity of google for information searching, there was a surprising paucity of literature that documents the impact of google maps/earth in academic libraries. nevertheless, there are some articles which indicate just how much google maps can help raise the profile of library services. terry ballard, a librarian at quinnipiac university, describes in a few articles how he and colleagues were able to use google earth placemarks to promote his library’s special collections.1 the potential for “discovering the library with google earth” is also a theme in an article by brenner and klein in which the portland state university library linked their urban planning documents collection to google earth for ease of searching.2 although the focus is on public libraries, michael vandenburg documents how his library system began “using google maps as an interface for the library catalogue.” in his article, vandenburg discusses that the inspiration for such a project came about through various google maps mashups that were popular on search oriented websites such as “housing maps,” which combined realtor listings from craigslist with a google maps api. using api coding, vandenburg was able to link latitude and longitude data of countries to individual opac records enabling a visual search for items at the country level.3 while these articles focused on use of google earth as a collection discovery tool, troy swanson notes the visualization aspects of the applications and their utility for teaching information literacy. swanson has students use google earth and second life as tools to create a virtual exhibit on malcolm x. although swanson notes that the final output by the students did not meet the initial expectations, valuable learning opportunities for teaching in a 3d space were recognized and should be pursued. 4 some of these opportunities are highlighted as case studies by lamb, noting the visualization aspects of google earth would be very useful for librarians providing instruction.5 academic uses of google earth and google maps in a library setting | dodsworth and nicholson 104 google maps/earth & academic libraries: a scan of selected library websites for the next stage, the authors performed an environmental scan of academic library websites to see how they are using and implementing google mapping technology into their services. many are doing creative and innovative project work which will, we hope, encourage and guide other libraries to consider doing something similar. mapping technology can be used in several different ways, and with internet users becoming more proficient using this technology, libraries have the opportunity to take advantage of this communication medium. any document or image that has a geographic component can be digitized and made easily accessible using online mapping technology. the following section will review some of the projects highlighted on websites. the projects can be grouped into the following categories: finding aids, collection distribution, and teaching and reference services. finding aids all collections in libraries require some sort of finding aid to locate library material—the most obvious one being the library catalog. however, there are many location-based materials that use customized finding aids such as map and air photo indexes, and geospatial data coverage maps. for several years now libraries have been trying to make access to the finding aids easier by digitizing them and offering them online. not only are online versions easily updatable, but they are quite often created using google technology, allowing for the use of modern basemaps and zoom capabilities. traditional paper indexes can be difficult to navigate, especially the historical ones, making the search process rather difficult for users and library staff. one of the most popular types of online indexes created by libraries is air photo indexes. most map libraries collect air photos, and many use similar indexes to help locate aerial photography for an area of interest. several libraries have digitized the indexes making the same information available online. users simply zoom into a geographical area and click on a point to retrieve the photo information they need in order to locate the air photo in the library collection. some libraries will even send an electronic copy of the photo to the users. the mcgill university library, for example, has made its air photo information available from their webpage in a kml format to be viewed in google earth. users can click on a point of interest to easily obtain the air photo information. mcgill library has also digitized topographic indexes, making them also available via google earth.6 the university of western ontario’s serge a. sauer’s library also provides its air photo indexes online, incorporating google maps directly into their website. placemarks representing individual photos have been inserted on a google map, along with the photo description so that when users click on the placemark, photo information is released. using google mapping technology to offer online finding aids that are searchable by location is an innovative and cost-free step towards collection accessibility. what would make these types of library collections even more accessible, however, is offering users online access to digital versions of the collection items themselves. so to bring the indexing project one step forward, not only would the photo reference information be made available, but the actual image would be too, information technology and libraries | june 2012 105 thereby allowing libraries to use google mapping technology as an avenue for collection distribution and delivery. collection delivery libraries have had digital collections for quite some time. many of course do not need to digitize resources themselves as they subscribe to products such as electronic journals and books. however there are still some less common collections that are physically housed in libraries that would be much more accessible to users if they were exposed and made available online. an internet search has shed light on numerous digitization projects that use google mapping technology to search for and deliver location-based collections. examples of these types of collections include historical maps and air photos, archived photos and postcards, audio interviews, community information, textual documents like letters and diaries, and gis data. mcmaster university library is one example of a library that has digitized a historical map collection and made it available online. an index to its world war i military maps and aerial photography was created using google maps, and was embedded into its webpage.7 users can click on an area of interest to bring up the corresponding high resolution map image. likewise, brock university library has also offered its historical air photo collection online, allowing users to search using a google map, and then download photos of interest.8 additionally, yale university library has created kml indexes of its fire insurance plans, with direct links to the digitized images.9 the university of connecticut library has digitized its local historical maps and using google maps had created a map mashup which includes historic landmarks. clicking on the landmarks provides users with links to related resources. several libraries have digitized other imagery, such as postcards and photography. this is particularly popular with archival and specialized collections. the university of vermont library has embedded a google map into its website with placemarks that when clicked lead the user to the library’s long trail collection, an assortment of over 900 images of the oldest long-distance hiking trail in the united states. the images have been digitized from hand-colored lantern slides.10 cleveland state university library has also done something similar with its cleveland memory project, in which google maps were embedded into the library webpage and placemarks of local historic landmarks added. when users click on the placemarks, they are able to access a description of the landmark along with a photograph of it. clicking on “more information” will lead the user to several related resources, including the library catalog, where original documents about the location are available (e.g., images, books).11 besides digitizing their collections, some libraries have also georeferenced them so that they could not only be accurately located using an index, but so that they could be viewed in google earth (kml format). offering collections in kml format greatly increases exposure and use of geographic resources because google earth is one of the more popular location-based applications used by library users and the public. geographic files such as georeferenced air photos and satellite images, as well as gis data used to be only viewed in specialized gis programs. but gis technology has evolved into so many online applications, offering all computer users the benefits of geographic information and a platform to distribute information. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 106 the university of waterloo map library is one example of a library that had digitized its historical air photo collection and made the images available in kml format for google earth usage.12 users can access a map index of the available photos from the map library webpage and then click on the index to download the images. the university of north carolina library has georeferenced several historical maps and made them available for viewing as an image overlay in google maps. this particular mapping project consists of around 150 thematic maps, including historical soil surveys, road and highway maps, city/county maps, and more. users can take advantage of the georeferenced maps and accurately compare historical features to modern ones with google maps’ basemap. having a preview of the dataset before it is downloaded assists the user in downloading only what is needed.13 perhaps more popular than a library’s air photo collection are libraries’ collections of geospatial data. geospatial, or gis data, has traditionally been only used by users who have access to gis programs such as esri’s arcgis, or arcview. more recently, librarians have discovered that when spatial files are converted into easy-to use file formats, such as kml, the user group is broadened and the files are used more. so it is no surprise that several libraries have converted their gis shapefiles (a spatial data file format used specifically in gis programs) into kml files and made them available for download from their webpages. university of connecticut library offers its gis files online in various formats, including kml. it also provides a sample image of the gis layer in google maps.14 baruch college at the city university of new york has made neighborhood census data available in google maps. the geographic boundary files were overlaid in google maps, and clicking on the map will lead users to the files available from the american census bureau’s fact finder. clearly, many libraries have incorporated google mapping technology into their digitization projects. the technology has proven capable of attracting collections that are not strictly locationfocused such as maps and air photos, but that have a location associated with it, such as archival photos of community landmarks or books written about a specific locale. google mapping technology makes the organization and storage of collections relatively effortless for library project managers, and it makes collection searching and distribution simple and friendly for the users. other uses of google maps/earth in libraries perhaps one of the simplest uses of google mapping technology can be illustrated by visiting several library websites. many libraries have embedded google maps into their website as either a webpage header15 survey: what are academic library staff doing with google maps/earth?16 following the review of the literature and academic library websites, the authors wanted to discover how academic librarians themselves were using google maps and google earth in their work, if at all. to capture this data, the authors compiled a set of survey questions targeting those in the academic library community who work with maps, gis, or geography/geology/earth science subject matter. information technology and libraries | june 2012 107 in preparing the survey questions, the authors were aware of a “survey fatigue” among the academic library community. at the time of research, many surveys were going out to librarians requesting their time and responses, so the authors wanted to keep the survey concise both in terms of number of questions, but also in the types of questions. in the end, the survey was created with twenty questions consisting of six yes/no questions, seven multiple choice, and the remaining seven questions being short answer. for distributing the survey, the authors wanted to reach as many librarians who worked with maps, geospatial data and government document subject matter as possible. the survey was then distributed on specialized map library and government publication listservs, including maps-l, govinfo, gis4lib, and carta (canadian maps & air photo systems forum). the survey was also distributed on the members’ only lists belonging to the association of canadian map libraries & archives (acmla) and the western association of map libraries (waml) listservs. the survey was made available on survey monkey for two months from december 2010 to the end of january 2011. the responses with the survey available during a quieter period of library activities, and thanks to a couple of reminder emails being sent out on the lists, our questionnaire received a total of 83 responses. who is using google maps/earth? the first couple questions dealt with the department or area of the library in which the respondent worked in, and what their position encompassed. as expected, a large majority of respondents, 81 percent, worked in “map/gis services” while 28.8 percent also had “general reference” responsibilities. other library service areas mentioned included “data services” and “it,” as well as some that fell outside library boundaries where staff worked in geography and environment science departments. not surprisingly, 52 percent of the responses indicated that their position was “librarian,” with the majority being “gis librarian” or “map librarian.” others included “reference & instructional services librarian” and “science librarian.” also received were 17 responses from gis specialists, library technicians and map assistants. what was especially noteworthy was that 12 responses were from library administrators, directors, or department heads who were finding time to work with google earth as part of their responsibilities. this number also included gis coordinators and map curators responsible for making decisions in their departments. google mapping products : what is being used, how often and for what purpose? to gain an understanding of how library staff are using google mapping products, a series of questions was asked of the respondents to determine which products were being used, how often and for which tasks. respondents were given a list of all the google mapping products available, and were asked to indicate which ones they had worked with. not surprisingly, the top two products used by respondents were google maps, 93 percent (71) and google earth, 91 percent (69). google maps api had been used by 40 percent (30) of the respondents, followed by google earth pro at 38 percent (29). eight percent (6) had also worked with google earth api, and 7 percent (5) had used google earth plus. interestingly, one respondent indicated that they had deployed google earth enterprise in their library. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 108 figure 1. respondents’ use of google mapping products since many of these users may have simply used the products occasionally, it was important to get a sense of how often the products were being used. when asked the question “how regularly do you work with google mapping products for work-related projects?” 69 percent (54) responded that they use the products at least once a month. of those responses, 45 percent (35) use them at least weekly. specifically, eighteen percent (14) use them one to two times a week, thirteen percent (10) use them three to four times a week, and fourteen percent (11) use them even more often than that. only six percent (5) responded that they don’t use the products at all. information technology and libraries | june 2012 109 figure 2. frequency of use for work-related projects as google maps/earth can be used in many different ways and for different purposes in a library environment, the survey inquired how in fact these products were being used in their libraries. the survey question listed four possible tasks that the technology could be used for with the additional option for respondents to enter their own ‘other’ usages. respondents could check off all that applied. the options given included: • instruction • promotion/marketing • answering research questions • creating/accessing a finding aid tool (air photo map indexes, etc.) • other: (fill in answer) the majority of respondents, 82 percent (58) indicated they were using the products to answer research questions; 61 percent (43) for creating or accessing a finding aid tool; 56 percent (40) for instruction purposes; 27 percent for promotion/marketing and 20 percent (14) have used them for “other” purposes including georeferencing imagery, for use in webpages or creating learning objects. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 110 figure 3. level and frequency of use in instruction are google mapping products being used for library instruction? for the authors, one of the best aspects of google maps/earth applications is their visualization capabilities. the ability to easily create and display geographic information to engage students makes google mapping applications an ideal instruction tool. in many ways, google maps and google earth have helped promote map and spatial literacy as concepts that are teachable. despite the free availability and ease of use of google mapping applications, the authors were somewhat surprised from the survey to find that 72 percent of library staff surveyed noted that their institution did not have any kind of map, spatial, or geospatial literacy policy in place. when it came time to provide instruction in the classroom, the survey found that only 31 percent (26) of the respondents had even used google earth in a classroom. nevertheless, in looking at the course levels, library instruction with google earth tools is actually occurring at all levels, from first year to graduate. significantly however, the frequency of the instruction seems to peak in the fourth year, where staff are using in upwards of six to nine courses. respondents were asked to give some details of these sessions, and they included a variety of class topics from environmental awareness education for first year students, to learning digitization skills in later years. has your library taken advantage of google map/earth technology for promotion or marketing purposes? information technology and libraries | june 2012 111 from our environmental scan of library websites we saw many interesting uses of google maps and google earth that were embedded directly into websites. perhaps because of this the authors were surprised to find that 55 percent of the survey respondents did not believe their library was using these technologies for promotion or marketing purposes. for those respondents who were using google maps or google earth to boost services for users, quite a few provided interesting examples of what this technology can offer. many were using google map apis to enhance map and aerial photo indexes, creating greater awareness of these resources and enhancing access. one respondent noted they had created a campus tour that highlighted all of the buildings that made up the library system, while others were using google api technology to showcase particular digitization projects such as folklore collections or geologic atlases. when asked if such activities have helped to enhance services or provided benefits to users, many responded that they had for both the users and for other library staff. greater speed and an increased familiarity of the collections were cited by several respondents, who no longer need to consult the paper indexes. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? although many libraries are now using google maps and google earth technology, the authors were surprised that many were not actively leveraging this expertise across their campuses. almost all the respondents either skipped the question or stated that they were not providing this kind of active support. several noted that their gis services were open to all and that they were responsible for the google earth pro licences on campus, but that this was the extent of their support. working with google map/earth (kml) files in the last few years, kml files have become one of the more popular ways to display and distribute geographic information online. with its ease of use, and access, kml files have considerably broadened the user base of geographic information. kml files can be easily created in google earth, and they can be easily converted from gis files in specialized programs. it is this ease of access and usability that has popularized geographic information, hence increasing exposure to library collections and services. this survey was therefore interested in determining how libraries are using and creating kml files. when survey respondents were asked whether they work with kml files, 64 percent (47) responded they did, with 85 percent (40) claiming that they create their own kml files. for those who create their own, 92 percent (34) said that they created kml files by converting them from another file format using an external application, such as arcgis, earthpoint, ogr20gr, or shp2kml software. 78 percent (29) also created them in google earth, and 32 percent (12) created kml files by writing their own xml code. the authors were most interested to know if kml files were actually held as part of the library holdings. thirty percent (13) of respondents noted that they provide access to their kml files as academic uses of google earth and google maps in a library setting | dodsworth and nicholson 112 part of their collections, with 89 percent, (8) claiming they could be located through the library website. other areas mentioned for access included libguides and specialized gis data catalogues available through the library’s website. in terms of quantity, one respondent claimed a collection of 500-800 kml files, while other responses mentioned amounts in the ranges from 5 to 100, with some claiming that they were not sure exactly how many made up their collection. what other online mapping tools are used in your library apart from google maps and google earth? although google maps and google earth are perhaps the most well-known online mapping tools available, the authors were also interested to learn if there were other products that libraries were using as part of their service offerings. as expected many mentioned esri’s arcgis online and esri’s arcexplorer, while other responses included bing maps, openstreetmap, and open layers. discussion google mapping applications are clearly being used for academic purposes in library settings. with such diverse capabilities made available in these programs, library professionals are using them in several different ways. google earth and google maps are popular among library staff who work with gis and/or map collections. in fact, over 90 percent of the respondents use both products, either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. google mapping products have also helped libraries revitalize their collections as well as assist in transferring spatial information literacy skills to academic students and faculty. the authors hope that readers who work in a map/gis library setting will be inspired by the many examples of online mapping projects outlined in this paper and they will too use the online tools to the benefit of their library and their library users. google mapping products offer libraries an online platform to share information, and resources in an easy, accessible and low-cost way. the survey results also indicate that map/gis professionals in academic libraries trust and rely on google maps/earth as a solution to many academic queries and needs. since google mapping products were created for the use by mainstream society, it can be suggested that all other nonmap and gis related fields may find the products to be beneficial and useful to them as well. google earth and google maps are very easy to learn and the users do not require any spatial or mapping skills. as this survey was limited to map/gis users, the authors do not know how, if at all, google mapping products are being used by other library staff. this will be a future area of study. the authors do strongly suggest however for map/gis librarians to consider offering training sessions to reference staff and liaison librarians. as a multidisciplinary tool, many subject areas can benefit from google maps/earth, as it’s certainly not a tool for use by only gis/map librarians. with a little bit of training, all library staff can use google mapping products to assist with research questions, spatial literacy, location-based projects and library instruction. in fact, library staff members responsible for nontraditional library material such as photographs, postcards, audio recordings, original hand-written documents, etc. may want to consider using online mapping products to organize their collection. too many times such original material is lost in the library’s filing system, is irretrievable or unavailable during convenient hours. google maps/earth will organize all collections based on their geographic location and can offer access to the actual information technology and libraries | june 2012 113 material. more exposure to and training on these free and easy to use products can increase collection use, promote mapping technology, and organize the library’s holdings. references 1 terry ballard, “inheriting the earth: using kml files to add placemarks relating to the library’s original content to google earth and google maps” new library world 110 (2009): 357-65, doi: 10.1108/0307480091097579. jacobsen, mikael and terry ballard, “google maps: you are here: using google maps to bring out your library’s local collections” library journal, october 15, 2008 (accessed september 11, 2011). http://www.libraryjournal.com/article/ca6602836.html. 2 michaela brenner and peter klein, “discovering the library with google earth” information technology and libraries 27 (2008): 32-6. 3 michael vandenburg, “using google maps as an interface for the library catalogue” library hitech 26 (2008): 33-40. 4 troy swanson, “google maps and second life: virtual platforms meet information literacy” college & research libraries news 69 (2008): 610-12. 5 annette lamb, and larry johnson, “virtual expeditions: google earth, gis, and geovisualization technologies in teaching and learning” teacher librarian 37 (2010): 81-5. 6 a list of mcgill library’s air photo indexes can be viewed at http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ (accessed september 8, 2011). 7 mcmaster university library map index can be found at http://library.mcmaster.ca/maps/ww1/ndx5to40.htm, (accessed september 8, 2011). 8 the brock university historical air photo collection can be accessed at: http://www.brocku.ca/maplibrary/airphoto/historical.php (accessed september 8, 2011). 9 the yale university sanborn indexes can be found at http://www.library.yale.edu/mapcoll/print_sanborn.html (accessed september 8, 2011). 10 the university of vermont library’s google map can be found at: http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20p hotographs (accessed september 8, 2011). 11 the cleveland memory project can be found at: http://www.clevelandmemory.org/hlneo/ (accessed september 8, 2011). http://www.libraryjournal.com/article/ca6602836.html http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ http://library.mcmaster.ca/maps/ww1/ndx5to40.htm http://www.brocku.ca/maplibrary/airphoto/historical.php http://www.library.yale.edu/mapcoll/print_sanborn.html http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://www.clevelandmemory.org/hlneo/ academic uses of google earth and google maps in a library setting | dodsworth and nicholson 114 12 the university of waterloo map library website can be found at: http://www.lib.uwaterloo.ca/locations/umd/project/ (accessed september 8, 2011). 13 the university of north carolina library provides interactive maps at http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html (accessed september 8, 2011). 14 the university of connecticut library offers gis files online here: http://magic.lib.uconn.edu/connecticut_data.html (accessed september 8, 2011). 15 campus map examples include: yale university library at http://maps.commons.yale.edu/venice/ example maps for library locations on campus include: brock university library, http://www.brocku.ca/maplibrary/general/where-is-the-ml.php university of north carolina, http://www.lib.unc.edu/libraries_collections.html (all accessed on september 8, 2011). 16 the full survey instrument can be found in the appendix of this document. http://www.lib.uwaterloo.ca/locations/umd/project/ http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html http://magic.lib.uconn.edu/connecticut_data.html http://maps.commons.yale.edu/venice/ http://www.brocku.ca/maplibrary/general/where-is-the-ml.php http://www.lib.unc.edu/libraries_collections.html information technology and libraries | june 2012 115 appendix google maps and google earth: influences and impacts in your library you and your library 1. what is your work position title? 2. what department/division/area of library do you work in? (click all that apply) o map/gis services o government publications o general reference o technical services o other (please specify): google mapping products 3. please check all the products you have worked with? o google maps o google maps api o google earth o google earth plus o google earth pro o google earth api o google earth enterprise 4. how regularly do you work with google mapping products for work-related projects? o not at all o a few times a year o 1-3 times a month o 1-2 times a week o 3-4 times a week o more often than that! o not sure 5. for what work related tasks, have you used these products? (click all that apply) o instruction o promotion/marketing o answering research questions o creating/accessing a finding aid tool (air photo, map indexes, etc.) academic uses of google earth and google maps in a library setting | dodsworth and nicholson 116 library instruction using google mapping products 6. does your library have a map, or spatial, or geospatial literacy policy or program? o yes o no 7. if you are using google mapping products for instruction, what level or year of university course(s) are you using it in, and in how many courses: 1-2 3-5 6-9 10-14 15 and more 1st year (100 level) 2nd year (200 level) 3rd year (300 level) 4th year (400 level) graduate level 8. please describe some of these activities? 9. does your library offer geographic awareness or gis-related training to some or all the library staff? promotion/marketing using google mapping products 10. has your library used google mapping technology to promote, offer, or deliver a service? (for example, offering kml files for download, indexes, guides, scanned documents, placemarks/urls from google maps/earth, etc.) o yes o no 10a. if yes, please describe with as much detail as possible how your library has used google mapping technology. if possible, please provide links to the projects. 10b. if yes, how have the google mapping related projects enhanced services or benefited the library? information technology and libraries | june 2012 117 11. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? kml/kmz collections 12. do you work with kml files? o yes o no 13. do you create your own kml files? o yes o no 14. how do you create your own kml files? o write xml code o save in google earth o convert from another file format using an external application o other (please specify) 15. does your library hold and provide access to kml or kmz files as part of its collections? o yes o no 16. if yes, approximately how many files do you currently hold? 17. how are these files findable by your patrons? o opac o library website o both 18. do you or other library staff use other online mapping tools? please list which ones and what they are used for. editorial and technological workflow tools to promote website quality | morton-owens 91 emily g. morton-owens editorial and technological workflow tools to promote website quality everard and galletta performed an experimental study with 232 university students to discover whether website flaws affected perception of site quality and trust. their three types of flaws were incompleteness, language errors (such as spelling mistakes), and poor style in terms of “ambiance and aesthetics,” including readable formatting of text. they discovered that subjects’ perception of flaws influenced their judgment of a site being highquality and trustworthy. further, they found that the first perceived error had a greater negative impact than additional problems did, and they described website users as “quite critical, negative, and unforgiving.”5 briggs et al. did two studies of users’ likelihood of accepting advice presented on a website. of the three factors they considered—credibility, personalization, and predictability—credibility was the most influential in predicting whether users would accept or reject the advice. “it is clear,” they report, “that the look and feel of a web site is paramount in first attracting the attention of a user and signaling the trustworthiness of the site. the site should be . . . free of errors and clutter.”6 though none of these studies focuses on libraries or academic websites and though they use various metrics of trustworthiness, together they point to the importance of quality. text quality and functional usability should be important to library website managers. libraries ask users to entrust them to choose resources, answer questions, and provide research advice, so projecting competence and trustworthiness is essential. it is a challenge to balance the concern for quality with the desire to update the website frequently and with librarians’ workloads. this paper describes a solution implemented in drupal that promotes participation while maintaining quality. the editorial system described draws on the author’s prior experience working in book publishing at penguin and random house, showing how a system that ensures quality in print publishing can be adjusted to fit the needs of websites. ■■ setting editing most people think of editing in terms of improving the correctness of a document: fixing spelling or punctuation errors, fact-checking, and so forth. these factors are probably the most salient ones in the sense that they are editor’s note: this paper is adapted from a presentation given at the 2010 lita forum library websites are an increasingly visible representation of the library as an institution, which makes website quality an important way to communicate competence and trustworthiness to users. a website editorial workflow is one way to enforce a process and ensure quality. in a workflow, users receive roles, like author or editor, and content travels through various stages in which grammar, spelling, tone, and format are checked. one library used a workflow system to involve librarians in the creation of content. this system, implemented in drupal, an opensource content management system, solved problems of coordination, quality, and comprehensiveness that existed on the library’s earlier, static website. t oday, libraries can treat their websites as a significant point of user contact and as a way of compensating for decreases in traditional measures of library use, like gate counts and circulation.1 websites offer more than just a gateway to journals; librarians also can consider instructional or explanatory webpages as a type of public service interaction.2 as users flock to the web to access electronic resources and services, a library’s website becomes an increasingly prominent representation of the library. at the new york university health sciences libraries (nyuhsl), for example, statistics for the 2009–10 academic year showed 580,980 in-person visits for all five locations combined. by comparison, the website received 986,922 visits. in other words, the libraries received 70 percent more website visits than in-person visits. many libraries conduct usability testing to determine whether their websites meet the functional needs of their users. a concern related to usability is quality: users form an impression of the library partly based on how it presents itself via the website. as several studies outside the library arena have shown, users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization. tseng and fogg, discussing non-web computer systems, present “surface credibility” as one of the types of credibility affecting users. they suggest that “small computer errors have disproportionately large effects on perceptions of credibility.”3 in another paper by fogg et al., “amateurism” is one of seven factors in a study of website credibility. the authors recommend that “organizations that care about credibility should be ever vigilant—and perhaps obsessive—to avoid small glitches in their websites. . . . even one typographical error or a single broken link is damaging.”4 emily g. morton-owens (emily.morton-owens@med.nyu.edu) is web services librarian, new york university health sciences libraries, new york. 92 information technology and libraries | september 2011 happens when a page moves from one state to another. the very simple workflow in figure 1 shows two roles (author and editor) and three states (draft, approval, and published). there are two transitions with permissions attached to them. only the author can decide when he or she is done working and make the transition from draft to approval. only the editor can decide when the page is ready and make the transition from approval to published. (in these figures, dotted borders indicate states in which the content is not visible to the public.) a book publishing workflow involves perhaps a dozen steps in which the manuscript passes between the author, his or her agent, and various editorial staff. a year can pass between receiving the manuscript and publishing the book. the reason for that careful, conservative process is that it is very difficult to fix a book once thousands of copies have been printed in hardcover. by contrast, consider a newspaper: a new version appears every day and contains corrections from previous editions. a newspaper workflow is hardly going to take a full year. a website is even more flexible than a newspaper because it can be fixed or improved at any time. the kind of multistep process used for books and newspapers is effective, but not practical for websites. a website should have a workflow for editorial quality control, but it should be proportional to the format in terms of the number of steps, the length of the process, and the number of people involved. alternate workflow models this paper focuses on a contributor/editor model in which multiple authors create material that is vetted by a central authority: the editor. other models could be implemented with much the same tools. for example, in a peer-review system as is used for academic journals, there is a reviewer role, and an article could have states like “published,” “under review,” “conditionally accepted,” and so forth. most noticeable when neglected. editors, however, have several other important roles. for example, they select what will be published. in book publishing, that involves rejecting the vast majority of material that is submitted. in many professional contexts, however, it means soliciting contributions and encouraging authors. either way, the editor has a role in deciding what topics are relevant and what authors should be involved. additionally, editors are often involved in presenting their products to audiences. in book publishing, that can mean weighing in on jacket designs or soliciting blurbs from popular authors. on websites, it might mean choosing templates or fonts. editors want to make materials attractive and accessible to the right audience. together, correctness, choice, and presentation are the main concerns of an editor and together contribute to quality. each of these ideas can be considered in light of library websites. correctness means offering information that is current and free of errors, contradictions, and confusing omissions. it also means representing the organization well by having text that is well written and appropriate for the audience. writing for the web is a special skill; people reading from screens have a tendency to skim, so text should be edited to be concise and preferably organized into short chunks with “visible structure.”7 there is also good guidance available about using meaningful link words, action phrases, and “layering” to limit the amount of information presented at once.8 of course, correctness also means avoiding the kind of obvious spelling and grammar mistakes that users find so detrimental. choice probably will not involve rejecting submissions to the website. instead, in a library context it could mean identifying information that should appear on the website and writing or soliciting content to answer that need. presentation may or may not have a marketing aspect. a public library’s website may advertise events and emphasize community participation. as an academic medical library, nyuhsl has in some sense a captive audience, but it is still important to communicate to users that librarians understand their unique and highlevel information needs and are qualified to partner with them. workflow a workflow is a way to assign responsibility for achieving the goals of correctness, choice, and presentation. it breaks the process down into steps that ensure the appropriate people review the material. it also leaves a paper trail that allows participants to see the history and status of material. workflow can alleviate the coordination problems that prevent a website from exhibiting the quality it should. a workflow is composed of states, roles, and transitions. pages have states (like “draft” or “published”) and users have roles (like “contributor” or “editor”). a transition figure 1. very basic workflow editorial and technological workflow tools to promote website quality | morton-owens 93 effect was on the quality of the website, which contained mistakes and confusing information. ■■ methods nyuhsl workflow and solutions to resolve its web management issues, nyuhsl chose to work with the drupal content management system (cms). the ability to set up workflow and inventory content by date, subject, or author was a leading reason for that decision. other reasons included usability of the backend for librarians, theming options, the scripting language the cms uses (php), and drupal’s popularity with other libraries and other nyu departments.9 nyuhsl’s drupal environment has four main user roles: 1. anonymous: these are visitors to the nyuhsl site who are not logged in (i.e., library users). they have no permissions to edit or manage content. they have no editorial responsibilities. 2. library staff: this group includes all the staff content authors. their role is to notice what content library users need and to contribute it. staff have been encouraged to view website contributions as something casual—more akin to writing an e-mail than writing a journal article. 3. marketing team: this five-member group checks content that will appear on the homepage. their mandate is to make sure that the content is accurate about library services and resources and represents the library well. its members include both librarians and staff with relevant experience. 4. administrators: there are three site admins; they have the most permissions because they also build the site and make changes to how it works. two of the three admins have copyediting experience from prior jobs, so they are responsible for content approvals. they copyedit for spelling, grammar, and readability. admins also check for malformed html created by the wysiwyg (what you see is what you get) interface provided for authors, and they use their knowledge of other material on the site to look out for potential conflicts or add relevant links. returning to the themes of correctness, choice, and presentation, it could be said that librarian authors are responsible for choice (deciding what to post), the marketing team is responsible for choice and presentation, and the administrators are responsible for all three. an important thing to understand is that each person in a role has the same permissions, and any one of in an upvoting system like reddit (http://reddit .com), content is published by default, any user has the ability to upvote (i.e., approve) a piece of content, and the criterion for being featured on the front page is the number of approvals. in a moderation system, any user can submit content and the default behavior is for the moderator to approve anything that is not outright offensive. the moderator never edits, just chooses the state “approved” or the state “denied.” moderation is often used to manage comments. another model, not considered here, is to create separate “staging” and “production” websites. content and features are piloted on the staging site before being pushed to the live site. (nyuhsl’s workflow occurs all on the live site.) still, even in a staging/production system the workflow is implicit in choosing someone who has the permission and responsibility to push the staging site to the production site. problems at nyuhsl in 2007, the web services librarian position at nyuhsl had been open for nearly a year. librarians who needed to post material to the website approached the head of library systems or the “sysadmin.” both of them could post pages, but they did not proofread. pages that became live on the website stayed: they were never systematically checked. if a librarian or user noticed a problem with a page, it was not clear who had the correct information or was responsible for fixing it. often, pages that were found to be out-of-date would be delinked from other pages but were left on the server and thus findable via search engines or bookmarks. because only a few people had ftp access to the server, but authored little content, the usernames shown on the server were useless for determining who was responsible for a page. similarly, timestamps on the server were misleading; someone might fix one link on a page without reviewing the rest of it, so the page could have a recent timestamp but be full of outdated information. even after a new web services librarian started in 2007, problems remained. the new librarian took over sole responsibility for posting content, which made the responsibility clearer but created a bottleneck, for example, if she went on vacation. furthermore, in a library with five locations and about sixty full-time employees, it was hard for one person to do justice to all the libraries’ activities. if a page required editing, there was no way to keep track of whose turn it was to work on the document. there also was no automatic notification when a page was published. this made it possible for content to go astray and be forgotten. these problems added up to frustration for would-be content authors, a time drain for systems staff, and less time to create new content and sites. the most significant 94 information technology and libraries | september 2011 at the top of the homepage. their appearance should not be delayed, so any staff author can publish one. class sessions are specific dates, times, and locations that a class is being offered. these posts are assembled from prewritten text, so there is no way to introduce errors and no reason to route them through an approval step. figure 2 illustrates the main steps of the three cases. the names of the states are shown with arrows indicating which role can make each transition. unlabeled arrows mean that any staff member can perform that step. figure 3 shows how, at each approval step, content can be sent back to the author (with comments) for revision. although this happens rarely, it is important to have a way to communicate with the author in a way that is traceable by the workflow. figure 4 illustrates the concept of retirement. nyuhsl needed a way to hide content from library users and search engines, but it is dangerous to allow library staff to delete content. also, old content is sometimes useful to refer to or can even be republished if the need arises. any library staff user can retire content if they recognize it as no longer relevant or appropriate. additionally, library staff can resurrect retired content by resetting it to the draft state. that is, they cannot directly publish retired content (because they do not have permission to publish), but they can put it back on the path to being published by saving it as a draft, editing, and resubmitting for approval. figure 5 shows that library staff do not really need to understand the details of workflow. for any new content, they only have two options: keep the content in the draft state or move it on to whatever next step is available. all them can perform an action. the five marketing team members do not vote on the content, nor do they all have to approve it; instead, any one of them, who happens to be at his workstation when they get a notification, is sufficient to perform the marketing team duty. also, the marketing team members and administrators do not “self-approve”—no matter how good an editor someone may be, he or she is rarely good at editing her own work. nyuhsl’s workflow considers three cases: 1. most types of content are reviewed by one of the administrators before going live. 2. content types that appear on the homepage (i.e., at higher visibility) are reviewed by a member of the marketing team before being reviewed by an administrator. 3. two types of content do not go through any workflow. alerts are urgent messages that appear in red figure 2. approval steps figure 3. returning contents for edits figure 4. retirement editorial and technological workflow tools to promote website quality | morton-owens 95 this may sound like a large volume of e-mail, but it does not appear to bother library staff. the subject line of every e-mail generated by the system is prefaced with “[hsl site]” for easy filtering. also, every e-mail is signed with “love, the nyuhsl website.” this started as a joke during testing but was retained because staff liked it so much. one described it as giving the site a “warm, fuzzy feeling.” drupal modules nyuhsl developers used a number of different drupal modules to achieve the desired workflow functionality. a simple system could be achieved using fewer modules; the book using drupal offers a good walkthrough of workflow, actions, and trigger.10 of course, it also would be possible to implement these ideas in another cms or in a homegrown system. this list does not describe how to configure each module because the features are constantly evolving; more information is available on the drupal website.11 the drupal modules used include: ■■ workflow ■■ actions ■■ trigger ■■ token ■■ module grants ■■ wysiwyg, imce, imce wysiwyg api bridge ■■ node expire ■■ taxonomy role ■■ ldap integration ■■ rules ■■ results participation figure 6 shows the number of page revisions per person from july 14, 2009, to november 4, 2010. since many pages are static and were created only once, but need to be updated regularly, a page creation and a page update count equally in this accounting, which was drawn from the node_revisions table in drupal. it gives a general sense of content-related activity. a reasonable number of staff have logged in, including all of the librarians and a number of staff in key positions (such as branch managers). the black bars represent the administrators of the website. it is clear that the workflow system, while broadening participation, has hardly diffused primary responsibility of managing the website. the web services librarian and web manager have by far the most page edits, as they both write new content and edit content written by all other users. of the other options are hidden because staff do not have permission to perform them. the status of content in the workflow can be checked by clicking on the workflow tab of each page, but it also is tracked by notification e-mails. when the content enters a state requiring an approval, each person in that approving role gets an e-mail letting them know something needs their attention. the e-mail includes a link directly to the editing page. for example, if a librarian writes a blog post and changes its state from “draft” to “ready for marketing approval,” he or she gets a confirmation e-mail that the post is in the marketing approval queue. the marketing team members each get an e-mail asking them to approve the post; only one needs to do so. once someone has performed that approval, the marketing team members receive an e-mail letting them know that no further action is required. now the content is in the “ready for approval” state and the author gets another e-mail notification. the administrators get a notification with a link to edit the post. once an administrator gives the post final approval, the author gets an e-mail indicating that the post is now live. the nyuhsl website workflow system also includes reminders. each piece of content in the system has an author (authorship can be reassigned, so it is not necessarily the person who originally created the page). the author receives an e-mail every four months reminding him or her to check the content, revise it if necessary, and re-save it so that it gets a new timestamp. if the author does not do so, he or she will continue to get reminders until the task is complete. also, the site administrators can refer to a list of content that is out of date and can follow up in person if needed. note that reminders only apply to static content types like pages and faqs, not to blog posts or event announcements, which are not expected to have permanent relevance. figure 5. workflow choices for library staff users 96 information technology and libraries | september 2011 check the status by clicking on the workflow tab. this eliminates the discouraging mystery of having content get lost on the way to being published. ■■ identifying “problem” content: the node expire module has been modified to send e-mail reminders about stale content; as a result, this “problem” figure 7 shows the distribution of content updates once the web team members have been removed. it is clear that a small number of heroic contributors are responsible for the bulk of new content and updates, with other users logging on sporadically to address specific needs or problems. how editorial workflow addresses nyuhsl’s problems different aspects of the nyuhsl editorial workflow address different website problems that existed before the move to a cms. together, the workflow features create a clearly defined track that marches contributed content along a path to publication while always making the history and status of that content clear. ■■ keeping track of who wrote what when: this information is collected by the core drupal software and visible on administrative pages. (drupal also can be customized to display or sort this information in more convenient ways.) ■■ preventing mistakes and inconsistencies: this requires a human editor, but drupal can be used to formalize that role, assign it to specific people, and ensure nothing gets published without being reviewed by an editor. ■■ bottlenecks: nyuhsl eliminated bottlenecks that stranded content waiting for one person to post it by creating roles with multiple members, any one of whom can advance content to the next state. there is no step in the system that can be performed by only one person. ■■ knowledge: the issue of having too much going on in the library for one person to report on was addressed by making it easier for more people to contribute. drupal encourages this through its usability (especially a wysiwyg editor), and workflow makes it safe by controlling how the contributions are posted. ■■ “lost” content: when staff contribute content, they get e-mail notifications about its status and also can figure 6. number of revisions by user each user is indicated by their employee type rather than by name. figure 7. number of revisions by user, minus web team each user is indicated by their employee type rather than by name editorial and technological workflow tools to promote website quality | morton-owens 97 places web content in the context of other communication methods, like e-mail marketing, press releases, and social media.12 in her view, it is not enough to consider a website on its own; it has to be part of a complete strategy for communicating with an organization’s audience. libraries embarking on a website redesign would benefit from contemplating this larger array of strategic issues in addition to the nitty-gritty of creating a process to ensure quality. ■■ conclusions nyuhsl differs from other libraries in its size, status as an academic medical library, level of it staffing, and other ways. some aspects of nyuhsl’s experience implementing editorial workflow will, however, likely be applicable to other libraries. it does not necessarily make sense to assign editorial responsibility to it staff; instead, there may be someone on staff who has editorial or journalistic experience and could serve as the content approver. many universities offer short copyediting courses, and a prospective website editor could attend such a course. implementing a workflow system, especially in drupal, requires a lot of detailed configuration. developers should make sure the workflow concept is clearly mapped out in terms of states, roles, and transitions before attempting to build anything. workflow can seem complicated to users too, so developers should endeavor to hide as much as possible from nonadministrators. small mistakes in drupal settings and permissions can cause confusing failures in the workflow system. for example, a user may find him or herself unable to advance a blog post from “draft” to “ready for approval,” or a state change from “ready for approval” to “live,” and may not actually cause the content to be published. it would save time in the long run to thoroughly test all the possibilities with volunteers who play each role before the site is in active use. finally, when the workflow is in place, the website’s managers may find themselves doing less writing and fewer content updates. they have a new role, though: to curate the site and support staff who use the new tools. the concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. if the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would. workflow is one way that libraries can promote a higher level of quality and perceived competence and reliability through their website presence. content is usually addressed by library staff without the administrators/editors doing anything at all. the administrators also can access a page that lists all the content that has been marked as “expired” so they know with whom to follow up. ■■ outdated content: some content may be outdated and undesirable to show the public or be indexed by search engines, but be useful to librarians. it also is not safe to allow staff to delete content, as they may do so by accident. these issues are addressed by the notion of “retiring” content, which hides content by unpublishing it but does not delete it from the system. ■■ future work the workflow system sets up an environment that achieves nyuhsl’s goals, structurally speaking, but social (nontechnology) considerations prevent it from living up to its full potential. not all of the librarians contribute regularly. this is partly because they are busy, and writing web content is not one of their job requirements. another reason is that some staff are more comfortable using the system than others, a phenomenon that reinforces itself as the expert users spend more time creating content and become even more expert. a third cause is that not all librarians may perceive that they have something useful to say. reluctant contributors have no external motivation to increase their involvement. it would be helpful to formalize the role of librarians as content contributors. there is presently no librarian at nyuhsl whose job description includes writing content for the website; even the web services librarian is charged only with “coordinating, designing, and maintaining” sites. ideally, every librarian job description would include working with users and would mention writing website content as an important forum for that. that said, it is not clear what metric could be used to judge the contributions fairly. it also is important to continue to emphasize the value of content contributions so that librarians are motivated and feel recognized. even librarians whose specialties are not outreach-oriented (e.g., systems librarians) have expert knowledge that could be shared in, say, a short article on how to set up rss feeds. workflow is part of a group of concerns being called “content strategy.” this concept, which has grown in popularity since 2008, includes editorial quality alongside issues like branding/messaging, search engine optimization, and information architecture. a content strategist would be concerned with why content is meaningful in addition to how it is managed. in her brief, useful book on the topic, kristina halvorson 98 information technology and libraries | september 2011 5. andrea everard and dennis f. galletta, “how presentation flaws affect perceived site quality, trust, and intention to purchase from an online store,” journal of management information systems 22 (2005–6): 79. 6. pamela briggs et al., “trust in online advice,” social science computer review 20 (2002): 330. 7. patrick j. lynch and sarah horton “online style,” web style guide, 3rd ed., http://webstyleguide.com/wsg3/9-editorial-style/3-online-style.html (accessed dec. 1, 2010). 8. janice (ginny) redish, letting go of the words: writing web content that works (san francisco: morgan kaufman, 2007). 9. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8 (2011): 1–14. 10. angela byron et al., using drupal (sebastopol, calif.: o’reilly, 2008). 11. all drupal modules can be found via http://drupal.org/ project/modules. 12. kristina halvorson, content strategy for the web (berkeley, calif.: new riders, 2010). ■■ acknowledgments thank you to jamie graham, karen hanson, dorothy moore, and vikram yelanadu. references 1. charles martell, “the absent user: physical use of academic library collections and services continues to decline 1995–2006,” journal of academic librarianship 34 (2008): 400–407. 2. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33 (2005): 371–79. 3. b. j. fogg and hsiang tseng, “the elements of computer credibility” (paper presented at chi ’99, pittsburgh, pennsylvania, may 15–20, 1999): 82. 4. b. j. fogg et al., “what makes web sites credible? a report on a large quantitative study” (paper presented at sigchi ’01, seattle, washington, mar. 31–apr. 4, 2001): 67–68. an omeka s repository for placeand land-based teaching and learning article an omeka s repository for placeand land-based teaching and learning neah ingram-monteiro and ro mckernan information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15123 neah ingram-monteiro (ingramn@wwu.edu) is a teaching and learning librarian, western washington university. ro mckernan (rmckernan@whatcom.edu) is the oer librarian, whatcom community college. © 2022. abstract our small community college library developed a learning object repository to support a crossinstitutional, land-based, multidisciplinary academic initiative using the open-source platform omeka s. drawing on critical, feminist, and open practices, we document the relational labor, dialogue, and tensions involved with this open education project. this case study shares our experience with tools and processes that may be helpful for other small-scale open education initiatives, including user-centered iterative design, copyright education, metadata design, and userinterface development in omeka s. introduction whatcom community college (wcc) is a rural, public institution, located on the lands of coast salish peoples, including lummi, nooksack, semiahmoo, and samish, in the northwest region of washington state and just south of british columbia and the us-canada border. referred to as the pacific northwest or the north puget sound, this area is part of the greater salish sea bioregion (see fig. 1). the sea’s name was adopted in 2009 in washington state and british columbia to refer collectively to the strait of georgia, the strait of juan de fuca, and the puget sound.1 the library at wcc has recently established several new digital services, including the college’s first institutional repository. housed within this repository is a site named the salish sea curriculum repository, which has been developed to host a collection of materials and multidisciplinary curriculum related to engaging college students with this bioregion and is a unique cross-institutional collaboration between the library and the salish sea institute at nearby western washington university (wwu). in this paper, we document, from the perspective of the constraints of a small community college library, the development of the institutional repository service through the creation of the salish sea curriculum repository. this first phase development process began through relational work and proceeded through user-centered iterative design considerations, copyright education, metadata design, and user-centered interface development. a second phase was then launched that produced a curated index of existing work. we document our process to demonstrate a case study of a small-scale, open-source–backed scholarly communication project that can be reasonably replicated by other smaller institutions in order to encourage scholarly communication and open education services at all levels of librarianship. mailto:ingramn@wwu.edu mailto:rmckernan@whatcom.edu https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 2 ingram-monteiro and mckernan figure 1. “reference map for the salish sea bioregion,” aquila flower, 2020. made as part of the salish sea atlas, https://wp.wwu.edu/salishseaatlas/. creative commons attributionnoncommercial-noderivatives 4.0 international license. https://wp.wwu.edu/salishseaatlas/ https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 3 ingram-monteiro and mckernan description of library repository service development in spring 2020, our library began to develop an institutional repository in response to a need to document faculty and staff scholarship and student scholarship, including newspapers and journals, and to host a collection of historical college images and videos. lacking the budget for bepress and the dedicated technical expertise to implement dspace, we found that omeka s hosted on a shared server through reclaim hosting was the most appropriate fit for our needs. omeka was originally developed at the roy rosenzweig center for history and new media at george mason university; it offers libraries and museums a way to publish online exhibits while ensuring accessibility and the inclusion of standards-based metadata to support discovery and use.2 omeka s is a later platform that offers one single point of administration for multiple instances of omeka, making it more usable for institutions like ours with a variety of unique collection sites with their own display templates. omeka s adheres to international standards, such as the dublin core schema for metadata, and allows creation of digital content collections, simple web pages, and complex online exhibits. the software can be managed and administered by one librarian. it allows interoperability through the open archives initiative protocol for metadata harvesting (oai-pmh is critical for future ingestion into the digital public library of america, which will provide wider discoverability) and rest apis (which will be necessary for any integration into the library’s opac). as the college’s open education resources and copyright librarian, mckernan initially developed and administers the library’s omeka s installation. while the initial collections were in line with traditional institutional repository sites, a new need developed later in 2020 in response to curricular developments at the college: a repository based around multidisciplinary, land-based learning objects. this repository was the salish sea curriculum repository. development of salish sea curriculum repository relational work, which in our process includes time to build relationships and engage in dialogue, is important given our mutual exploration of open education projects. luo, hostetler, freeman, and stefaniak point to the importance of a campus culture that supports open education, through resource allocation such as oer design and development.3 as part of a larger team, this project represented three institutions (wcc, wwu, and the university of british columbia) with three different open education cultures. and additional faculty partners at wcc and wwu had varying experience with open education, ranging from an awareness of creative commons licenses to experience authoring oer textbooks. dai and carpenter discuss the feminized labor that goes into oer projects, arguing that, like instruction librarianship, oer librarianship is predicated on relational work.4 as feminized work is often invisible and undervalued, they suggest planning and documenting time for consultative tasks such as meeting with faculty as ways of bringing critical, feminist, and open pedagogies into this work.5 by discussing the development of the salish sea curriculum repository in terms of development phases, we want to devote space not just to the final products, but also to documenting this collaborative process. a note on terminology: the eric descriptor place based education is described as pedagogy to engage learners in their cultural, social, and ecological contexts; it often includes inviting community members in as instructors and bringing learners into the natural environments where https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 4 ingram-monteiro and mckernan they live. while place-based education is the prevailing term, calderon has argued that the expression of this term has typically erased indigenous relations with land and obscures the violence of settler colonialism.6 in contrast, calderon writes that land education or land-based education makes explicit the ideologies and structures of settler colonialism and that land education centers indigenous peoples’ relations to land, critically examining what it means to inhabit the lands of indigenous nations. we will continue to use the phrase land -based education in this paper. at wcc, the salish sea studies (sali) curriculum was developed by history instructor anna booker in partnership with natalie j. k. baloy of the salish sea institute at wwu. the curriculum includes experiential learning about the complex human-environment systems of the bioregion that builds a sense of place, connection, and relational accountability.7 at both colleges, the instruction teams who co-teach this course rotate from term to term and include faculty from multiple disciplines. instructors at wcc have included faculty from the departments of history, anthropology, geology, and sociology. at wwu, instructors have included faculty with appointments in salish sea studies, canadian-american studies, comparative indigenous studies, the college of the environment, and fairhaven college of interdisciplinary studies. units in the introductory sali course are designed to demonstrate that many ways of knowing are relevant and important to understanding the salish sea. when the second iteration of the course shifted online due to the covid-19 pandemic, with less than two weeks’ notice in spring 2020, instructors shifted to creating learning objects for asynchronous learning. since then, curricula for this course and adjacent courses in salish sea studies have been designed for online, in-person, and hybrid learning. subsequently booker had received a 2020–22 grant from the national endowment for the humanities (neh) to further develop the salish sea studies curriculum. with the pivot to online, she was looking for digital ways to support curriculum sharing. baloy had been approached by ingram-monteiro (who was in the ubc’s master of library and information studies program at the time) about supporting salish sea studies during the initial covid-19 shutdown. baloy connected her with booker about a possible grant work project. because of booker’s previous work on oer with mckernan, she suggested approaching the college’s library about a collaboration. the idea of using existing repository software to build a new site that would serve as a space to collect and share curriculum was born out of this dynamic context. in addition to the rotating instruction team and teaching modality variables introduced by the global health pandemic, the field of salish sea studies as taught in our context was also being defined and articulated concurrent with the initial development of this repository (through distinct curricular conversations). what started as a repository to share open educational resources about the salish sea bioregion became an online space for creating and sharing a curated set of oer explicitly for use in land-based, experiential, multidisciplinary, and transboundary teaching and learning about the salish sea bioregion. phase one: initial digital repository development the first phase of development ran from november 2020 to january 2021. library work in the first phase included designing, building out, styling, and initially populating the repository. we used a user-centered iterative design process in this phase (see fig. 2). user-centered iterative information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 5 ingram-monteiro and mckernan design is used in the human-computer interaction field to foreground user needs during design processes. van house, butler, ogle, and schiff discuss how designers need to know the larger context and purposes of users’ work with a digital library, as well as their specific tasks and information acts, such as searching or repackaging.8 for our project, user-centered iterative design started with consulting faculty partners to help them articulate use cases, identify primary users, and discuss ways to build the repository to incentivize submissions from instructors. these consultations included asking a lot of questions to draw out their needs and wishes for the repository. ingram-monteiro spent about 15 hours (of the 120 dedicated to this phase of the project) meeting or corresponding with our faculty partners over the course of four weeks. our partners contributed this time and more, in addition to their standard workloads and during winter break. this was very much a dialogue, as we went back and forth on some topics over the course of a few weeks, brainstorming together and looking for inspiration to share with each other. figure 2. “agile development” by dave gray is licensed under cc by-nd 2.0. through our dialogue, we were able to articulate that the repository would include adaptable, reusable teaching materials for lessons and courses about the salish sea. users of the repository would primarily include faculty contributors who use the repository to submit teaching materials and instructors from bioregional higher education institutions who use the repository to find material to adapt for their teaching. other users considered could include site visitors who are seeking information about the introduction to salish sea studies course that is taught in parallel at wcc and wwu. copyright education we provided copyright education in various modalities to the instructors who were involved during this phase. our faculty partners’ questions informed how we built copyright considerations into the repository. questions included what materials they could use and remix in a learning object that they would then license for reuse. while they felt protected by fair use for distributing copyrighted videos, maps, or readings within a traditionally mediated classroom environment, this calculus could not be automatically extended for distribution in oer. a recent systematic https://www.flickr.com/photos/38075047@n00/6865783267 https://www.flickr.com/photos/davegray/ https://creativecommons.org/licenses/by-nd/2.0/?ref=ccsearch&atype=rich information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 6 ingram-monteiro and mckernan literature review of empirical oer studies found that faculty are consistently uncertain how to license their creation when it includes remixed materials9 and our experience echoed that.10 the solution for the salish sea repository was the creation of a resources section in the repository that included citations for all rights reserved published works, so that an instructor could point to traditionally copyrighted works without directly uploading them to the repository (see fig. 3). figure 3. screenshot of an omeka s record for an article citation in the resources section. our faculty partners also had questions about creative commons licensing and how to select an appropriate license for their work. we designed the curriculum submission form to include explanations of each creative commons license type, as well as public domain and all rights reserved options. a submitter can read about these six terms and select which license is appropriate for their activity. figure 4. screenshot of copyright guidance and license selector on the submission form. information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 7 ingram-monteiro and mckernan while we were able to provide some guidance in the context of this project, bigger questions remained. faculty partners worked through the challenges of creating public oer from private, contextual lesson plans and learning objects.11 given the curricular emphasis on relational accountability in the salish sea bioregion, and the central role of indigenous knowledge holders and ways of knowing in line with land-based education, materials referenced in salish sea studies include traditional knowledge of indigenous nations. while this knowledge is shared in a consensual way in the context of a course (where a knowledge holder may be an invited guest, for example), sharing these materials in an open repository online introduces different considerations.12 local contexts’ traditional knowledge (tk) labels are a popular topic in open education and have been adopted by the library of congress, but as reijerkerk demonstrates, simply applying these labels in online catalogs is insufficient.13 the use of these labels is intended to be one intervention and done in relationship with indigenous knowledge holders.14 our role may be more to ask questions about existing permissions to share that knowledge, especially around ownership of that knowledge. christen provides more context on how tk labels can be applied in such material when it has been published in the public domain.15 metadata schema omeka s offers linked data infrastructure with the dublin coretm metadata initiative’s dcmi metadata terms (dcterms:) as the default vocabulary.16 we used this vocabulary to create a functional metadata schema that allows faculty to describe their submissions in ways that would be useful for other users. the metadata added during the submission process was then cleaned and enhanced by the librarian who reviewed each submission. for site visitors, this metadata allows browsing through the set of learning objects in the repository; they can browse lessons from a particular discipline or place, for example. they can also perform keyword searching to find results based on titles and lesson descriptions. during this design phase of the metadata, we started with an examination of the types of learning objects that would be shared through the repository. through iterative design we arrived at an initial structure that included four types of objects that would be deposited: assignments, activities, existing published resources, and learning modules. for each type, we then created an omeka s resource template to support consistent metadata processing and a “collecting form” to support metadata collection. we then added 40 resources, one module, two assignments, and five activities—all provided by our faculty partners—to test this structure. after more faculty consultation, we simplified the metadata design to include two learning object types: activities (including assignments) created by instructors and bibliographic citations for core resources used in salish sea studies. we refined our metadata schema for each of these and documented this metadata design and processing. see table 1 for one example. https://localcontexts.org/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 8 ingram-monteiro and mckernan table 1. metadata design—resource type: activity metadata field label values notes dcterms:title activity described by submitter dcterms:description lesson description described by submitter dcterms:subject discipline indigenous ways of knowing, humanities, natural sciences, social sciences, multidisciplinary/interdisciplinary, other dcterms:spatial spatial coverage described by submitter repeatable dcterms:audience course modality face-to-face, online synchronous, online asynchronous, hybrid, other dcterms:temporal temporal coverage described by submitter repeatable dcterms:format primary format of activity icebreaker, problem-based discussion, field trip, other dcterms:extent estimated time for students to complete activity 15 minutes, 30 minutes, one hour, two hours, more than two hours, multiple sessions, all quarter dcterms:creator primary creator(s) full name repeatable dcterms:contributor institutional affiliation western washington university, whatcom community college, other dcterms:license license 8 listed in item set add as “omeka resource” user interface design once we had a working metadata schema and collection mechanism/workflow and the high -level site structure, we shifted our focus to considerations of the interactivity and look and feel of the repository site. we heard from our faculty partners that they wanted a clean, colorful design that would be appealing to users. they shared the stanford history education group as one example, noting its simple navigation, and blackpast.org, noting its interactive timeline. they also shared spokanehistorical.org, which is built on omeka and includes an interactive map. we tried to manage expectations about what would be possible. we did not have many resources available for web design or experience with the omeka-compatible mapping and timeline tool neatline. still, we found that it was possible to create a simple, visually pleasing interface with some basic css skills, omeka s modules, and documentation from histsex.org, a library collection made with omeka s.17 modules in omeka s are plug-ins that can be installed and activated to add additional functionality. one of the more key modules we activated (on the admin side) was the css editor.18 we could then write an internal style sheet in this editor, in which to style the colors and links in accordance with web content accessibility guidelines for styling headers, color contrast, and text decoration for hyperlinks.19 the css editor module also includes input fields for external style sheets, which enabled us to include one for google fonts. the color scheme we selected uses wcc’s colors and complements the blues and greens of the salish sea. https://sheg.stanford.edu/ https://www.blackpast.org/ https://spokanehistorical.org/ https://neatline.org/ http://histsex.org/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 9 ingram-monteiro and mckernan another key module we added was the mapping module.20 this enabled us to represent the spatial coverage of learning objects on maps within the repository. instructors contributing content can associate their contribution with a specific geographic place by placing a marker or entering an address. the librarian processing the contribution can add to and edit this geospatial data. for site visitors, the mapping module allows interactive, map-based browsing of repository items. we initially deployed it only for bibliographic citations because this was the only resource type with a critical mass of existing content when we built the repository. when a visitor opens the resources page from the navigation, they see the option to “find resources by place,” with an embedded openstreetmap that includes markers that link to associated citations in the repository (see fig. 5). the spatial markers help locate scholarship to concepts of land-based education. figure 5. a screenshot of map indicators tied to repository items. a third omeka s module that we installed was fields as tags.21 this module increases the number of browsable metadata fields available to visitors on the main pages of the repository; in addition to title, subject, extent, and creator, visitors can also browse spatial and temporal coverage tags. in the interim months between phase one and phase two, we introduced the repository to wcc faculty who were participating in a year-long professional development workshop about teaching salish sea studies. the culminating project for that workshop involved submitting a teaching activity to the repository. however, while many participants began the submission process, few information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 10 ingram-monteiro and mckernan were able to submit an activity that was repository ready. reflecting on this and their own experiences developing curriculum for introduction to the salish sea, our faculty partners scaled back on expectations for oer development. it was evident that thoughtfully designing land and place-based, experiential, multidisciplinary, transboundary curriculum that is also open and adaptable would require dedicated resources in the context of deeper relationships and a longer timeframe. phase two: a curated index of published works the second phase of development, which took place roughly from july to december 2021, focused on further honing the interface and usability of the site. our faculty partners designated some neh grant funds to pay ingram-monteiro a stipend for summer website development work. further developing the resources section of the repository thus became the focus of our work in summer 2021. by providing a central access point for curated, published works about the salish sea, the repository would support faculty who were developing salish sea teaching materials as oer. we also referred to the resources section as the salish sea index, a space that provides building blocks for teaching materials. developing this index included adding individual pages for maps, collections, and terminology. the maps and collections pages—as well as the original resources page that includes published articles, books, videos, podcasts—are configured to automatically pull in newly entered omeka s items. each item was added using our previously developed bibliographic citation resource template for published resources. whenever there was a creative commons license, open access license, or copyright information provided, we note this at the item level to facilitate reuse and attribution. the digital collections page points visitors to digital collections (such as the northwest indian college salish sea speaker series videos and the south asian american digital archive) as well as to information about physical collections (such as the wing luke museum and the center for pacific northwest studies), which can be visited in person. finally, in addition to maps, journal articles, archival collections, and other media catalogued in the index, another building block for creating salish sea teaching materials is the terminology. this html page is in progress. it will be a reference tool for the vocabulary of salish sea studies, synthesizing concepts that are critical to this multidisciplinary and transboundary pedagogy. providing these building blocks functions as a way of supporting oer creation and remixing. phase three: future work—building transboundary community as of spring 2022, the salish sea repository’s role is to share curricular building blocks, learning outcomes, and sample materials. our faculty partners, with our support, are working on building a transboundary community around the repository, including librarians and interdisciplinary scholars engaged with relational accountability and landand place-based learning in higher education. we are producing this article in this context. as we expressed earlier, we wanted this to document the way relationship building is critical to the development and future growth of this project. as an example, we met with ashley edwards, one of two indigenous initiatives and instruction librarians at simon fraser university (sfu). in her work with the indigenous curriculum resource centre at sfu, edwards collects and organizes http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1273 http://whatcomdigitalcommons.org/s/salishsea/item/1274 http://whatcomdigitalcommons.org/s/salishsea/item/1271 http://whatcomdigitalcommons.org/s/salishsea/item/1271 information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 11 ingram-monteiro and mckernan resources to assist faculty in learning about and engaging in indigenizing their pedagogy and curriculum, centering materials by and about coast salish communities.22 the creation of the center is part of sfu’s response to canada’s truth and reconciliation commission calls to action.23 though no such mandate exists in the us, indigenous and non-indigenous settler faculty at wwu and wcc are engaged with indigenization, as reflected in the inclusion of land-based education in salish sea studies. building a transboundary community invites collaboration with indigenous librarians like edwards, from whose work we can learn how to better support indigenization of curriculum in ethical, responsible, and respectful ways. we also presented the repository at the washington library association academic libraries division/association of college and research libraries of oregon and washington (ald/acrlwa and acrl-wa) academic libraries summit in fall 2021 with the intention of sharing this case study to document our work in the vein of open scholarship. audience questions focused on labor—attendees were interested in knowing about the job titles of people involved with the project. as more oer are developed for sharing in the salish sea repository, we intend to continually evaluate the effectiveness of the repository for users, including user experiences around adapting and remixing the building blocks, filling out the submission form, and browsing learning objects. one area that we expect to focus on is refining the metadata scheme. for example, what is the best approach for describing spatial coverage in this repository, given the variety of place names that can describe one location? we began with a controlled vocabulary and then shifted to an open entry user-defined field. this trades off the user’s ability to browse by a place name for the contributor’s ability to choose the specificity of a location name, which is important given the interdisciplinarity of land-based learning and the inclusion of multiple ways of knowing in this curriculum. we hope metadata librarians will be interested in bringing their skills to this project and working through such questions. future refinements will be driven by these evaluations. summary in response to an emerging, multidisciplinary academic initiative that originated at two local public colleges, our small library utilized our omeka s installation to create the salish sea curriculum repository. we implemented this open education project using a user-centered iterative development process. as of spring 2022, this has involved three phases of development. in the first phase, library work focused on metadata design, copyright education, and user interface development in omeka s. in the second phase, we focused on developing an index of salish sea resources, including information to help instructors find, adapt, and remix published maps, vetted terminology, and bioregional archival collections. the third phase will be focused on building a transboundary community around the creation and sharing of oer that meets salish sea studies learning objectives, including inviting other librarians to bring their specialized skills in support of this project. https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 12 ingram-monteiro and mckernan endnotes 1 “salish sea naming project,” college of the environment, western washington university, accessed march 17, 2022, https://cenv.wwu.edu/si/salish-sea-naming-project. 2 “project,” omeka, accessed march 16, 2022, https://omeka.org/about/project/. 3 tina luo, kirsten hostetler, candice freeman, and jill stefaniak, ”the power of open: benefits, barriers, and strategies for integration of open educational resources,” open learning: the journal of open, distance, and e-learning 35, no. 2 (october 2020): 149, https://doi.org/10.1080/02680513.2019.1677222. 4 jessica y. dai and lindsay inge carpenter, “bad (feminist) librarians: theories and strategies for oer librarianship,” international journal of open educational resources 3, no. 1 (may 2020): 152, https://doi.org/10.18278/ijoer.3.1.10. 5 dai and carpenter, 159. 6 dolores calderon, “speaking back to manifest destinies: a land education-based approach to critical curriculum inquiry,” environmental education research 20, no. 1, (2014): 24–36. https://doi.org/10.1080/13504622.2013.865114. 7 kathryn l. sobocinski, “section 6: opportunities for improving assessment and understanding of the salish sea,” in state of the salish sea, ed. k. l. sobocinski (bellingham: salish sea institute, 2021), 213, http://doi.org/10.25710/vfhb-3a69. 8 nancy a. van house, mark h. butler, virginia ogle, and lisa shiff, “user-centered iterative design for digital libraries,” d-lib magazine (february 1996), http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/february96/02vanhouse.html. 9 luo, hostetler, freeman, and stefaniak, 143. 10 the 2021 guide “code of best practices in fair use for open educational resources,” distributed by american university’s washington college of law and the center for media and social impact, has since become an important resource in such consultations. 11 one faculty partner shared walthausen’s article “how the internet is complicating the art of teaching” (the atlantic, october 26, 2016) pointing to the sentence “what i hadn’t understood before this tentative jump into the broader sharing economy was that making assignments is so much about personalization,” which illustrates one challenge to this work. 12 we are writing from lummi territory and so will share an example from here. anthropologist stacy michelle rasmus was asked by the lummi nation to study knowledge transmission and acquisition in the 1990s and early 2000s, including how research relationships are affected by the way knowledge is accessed and controlled in different contexts. in a 2002 article, rasmus shares several ways that outside researchers unethically extract and disseminate knowledge beyond the community. she shares how knowledge holders will share knowledge without giving it away, but outsiders often interpret this sharing as a license to do with it as they wish. https://cenv.wwu.edu/si/salish-sea-naming-project https://omeka.org/about/project/ https://doi.org/10.1080/02680513.2019.1677222 https://doi.org/10.18278/ijoer.3.1.10 https://doi.org/10.1080/13504622.2013.865114 http://doi.org/10.25710/vfhb-3a69 https://cmsimpact.org/code/open-educational-resources/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 13 ingram-monteiro and mckernan she writes, “… some researchers may not in fact know when they have been exposed to knowledge that, within a community context, is considered private in nature.” 13 dana reijerkerk, “ux design in online catalogs: practical issues with implementing traditional knowledge (tk) labels,” first monday 25, no. 8 (august 2020), https://doi.org/10.5210/fm.v25i8.10406. 14 jane anderson and kim christen, “‘chuck a copyright on it’: dilemmas of digital return and the possibilities for traditional knowledge licenses and labels,” museum anthropology review 7, no. 1–2 (spring-fall 2013), 110. 15 kimberly christen, “tribal archives, traditional knowledge, and local contexts: why the ‘s’ matters,” journal of western archives 6, no. 1 (2015), 13, https://doi.org/10.26077/78d5-47cf. 16 “vocabularies,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/content/vocabularies/. 17 brian m. watson, “grant application for 50 years on, many years past,” iu scholarworks (march 2020), https://hdl.handle.net/2022/25593. 18 “css editor,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/csseditor/. 19 “wcag 2 overview,” web accessibility initiative, accessed may 13, 2022, https://www.w3.org/wai/standards-guidelines/wcag/. 20 “mapping,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/mapping/. 21 libnamic, ”omeka s fields as tags,” omeka s modules, accessed may 13, 2022, https://omeka.org/s/modules/fieldsastags/. 22 ashley edwards, “supporting faculty in indigenizing curriculum and pedagogy: case study of the indigenous curriculum resource centre,” in ethnic studies in academic and research libraries, eds. raymond pun, melissa cardenas-dow, and kenya s. flash (chicago, il: association of college & research libraries, 2021), 171–72. 23 edwards, 177. https://doi.org/10.5210/fm.v25i8.10406 https://doi.org/10.26077/78d5-47cf https://omeka.org/s/docs/user-manual/content/vocabularies/ https://omeka.org/s/docs/user-manual/content/vocabularies/ https://hdl.handle.net/2022/25593 https://omeka.org/s/docs/user-manual/modules/csseditor/ https://omeka.org/s/docs/user-manual/modules/csseditor/ https://www.w3.org/wai/standards-guidelines/wcag/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/modules/fieldsastags/ abstract introduction description of library repository service development development of salish sea curriculum repository phase one: initial digital repository development copyright education metadata schema user interface design phase two: a curated index of published works phase three: future work—building transboundary community summary endnotes technology integration in storytime programs: provider perspectives article technology integration in storytime programs provider perspectives maria cahill, erin ingram, and soohyung joo information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15701 maria cahill (maria.cahill@uky.edu) is professor, university of kentucky. erin ingram (erin.ingram@chpl.org) is youth librarian, cincinnati and hamilton county public library. soohyung joo (soohyung.joo@uky.edu) is associate professor, university of kentucky. © 2023. abstract technology use is widespread in the lives of children and families, and parents and caregivers express concern about children’s safety and development in relation to technology use. children’s librarians have a unique role to play in guiding the technology use of children and families, yet little is known about how public library programs facilitate children’s digital literacy. this study sought to uncover librarians’ purposes for using technology in programs with young children as well as the supporting factors and barriers they encountered in attempting to do so. findings reveal 10 purposes for integrating technology into public library storytime programs and 15 factors across four dimensions that facilitate and/or inhibit its inclusion. if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. introduction technology use is widespread in the lives of children and families. from a very early age, children in highly developed countries across the world regularly interact with technology and data from device trackers substantiate parental reports.1 nearly all families have access to one or more mobile devices, and nearly three-fourths of children in the united states begin some form of digital engagement, primarily television viewing, before age three.2 prior to formal schooling, children (ages two to four) in highly developed countries tend to use a device with a screen for about two and a half hours per day on average.3 differences in screen use by income level and race are significant, with children from lowerincome families and children of color spending more time on electronic devices than children from higher-income families and children who are white. though most parents do allow their children to use technology, many voice some concerns about their children’s well-being, particularly regarding privacy as well as the content of the media.4 yet, young children’s digital activity can be beneficial, particularly when the technology is designed to foster active, meaningful engagement and when it facilitates social interaction.5 in light of children’s usage and parents’ concerns, librarians in public libraries have a unique role to play in this information realm. not only can librarians provide access to technology and recommended resources but they can also provide guidance in how to use technology to contribute to children’s learning, especially in the areas of reading, information literacy, and academic concepts.6 yet, little is known about whether librarians actually facilitate children’s digital literacy through integration of technology into programs, and this dearth of empirical mailto:maria.cahill@uky.edu mailto:erin.ingram@chpl.org mailto:soohyung.joo@uky.edu information technology and libraries june 2023 technology integration in storytime programs 2 cahill, ingram, and joo evidence is highlighted in the association of library services to children (alsc) research agenda.7 storytime, as a program attended by both children and caregivers, can be used as a time for children’s librarians to integrate technology for the purposes of modeling and explaining how various electronic tools might be beneficial for young children.8 due to this potential, it is important to understand how and why children’s librarians are—or are not—integrating technology into storytime programs. previous studies of technology use in children’s programs and storytimes internationally, there have been few investigations of technology integration within library programs for young children. within the united states, two survey studies, both commissio ned by alsc, sought to capture the use of technology in youth programming.9 the initial survey launched in 2014 and the follow-up survey in 2018. respondents to these surveys reported that the types of devices used most often in libraries were proprietary institutional devices, digital tablets, tangible tech such as squishy circuits that allow children to build electrical circuits with play dough, and programmable tech such as cubetto, a wooden robot toy.10 additionally, more than half of respondents working in medium and large libraries and more than 45% of those working in small libraries indicated using digital devices during storytimes.11 conversely, a comprehensive study of programming for young children in public libraries, which included observations, concluded that, “while many libraries offer families a place to use computers and other digital resources together, few libraries actively promote the use of technology during their programming.”12 notably, neither the 2014 nor 2018 alsc survey included questions about the types of technology used in storytimes, nor were respondents asked to explain their thoughts on why or how technology was or was not included in storytime.13 a study conducted in aotearoa new zealand collected data about technology use in storytime in three phases: a survey of 25 children’s librarians, interviews with librarians in nine libraries, and a survey of 28 caregivers who attend a library storytime with a young child.14 slightly more than a quarter of the librarians responding to the survey reported incorporating digital technology such as tablets or e-books into storytime programs. the most common rationale for technology use in storytime was to educate caregivers. other reasons included for the novelty of it and to promote accessibility and the aims of library services. interviewees explained that they used technology in storytime to show caregivers the availability of high-quality digital media such as e-books and educational apps, with one likening the use and recommendation of digital media to librarians’ traditional role as recommenders of storybooks (i.e., readers’ advisory services). conversely, one interviewee expressed reservations about using technology for fear that children would be distracted from the content of the story. the majority of caregiver respondents who had attended a storytime with digital technology reported enjoying the experience. however, those who had never attended a storytime with technology were apprehensive about doing so. technology best practices: joint media engagement and media mentorship recent scholarship encourages children’s librarians to use their expertise and experience to evaluate and recommend technology and new media resources as well as to model for adults how to interact with children as they use technology.15 for example, librarians can promote joint media engagement during storytimes both by modeling the practice and by directly explaining it to the adults in attendance. using technology during storytime can be seen as modeling modern literacy practices, just as reading print books has modeled literacy practices in traditional storytimes since the 1940s .16 information technology and libraries june 2023 technology integration in storytime programs 3 cahill, ingram, and joo alsc instructs youth services librarians to act as media mentors, a role that means they will assist caregivers in choosing and using technology by researching new technology and by modeling technology use, such as joint media engagement, for caregivers in programs such as storytimes.17 media mentorship is seen as an extension of how youth services librarians have traditionally been called upon to meet the needs of caregivers and children with their knowledge of child development and ability to facilitate caregivers’ information seeking.18 while alsc encourages media mentorship, the extent to which children’s librarians have embraced this role is unclear in professional research. findings from prior surveys and interviews with storytime providers suggest that librarians are regularly integrating technology into programs while observations of library programs suggest otherwise.19 further, goulding and colleagues found that while many librarians were comfortable recommending technology such as apps, it was unclear whether or not they were modeling its use during storytimes.20 study objectives the overarching research question of this study is “how do storytime providers view the integration of technology into storytime programs?” the following three research questions guide this study. 1. what are the purposes for using technology in storytimes? 2. what are factors associated with adopting technology in storytimes? 3. what are barriers to integrating technology in storytimes? method participants as part of a larger institute of museum and library services (imls)-funded, multistate study that was approved by the university of kentucky institutional review board (irb number 42829), researchers conducted semi-structured interviews with 34 library staff who facilitate storytime programs at public libraries serving urban, suburban, and rural communities across kentucky, ohio, and indiana.21 interviewees were not asked to identify their race or ethnicity. thirty-two identified as female and two as male. all but one of the participants (97%) had earned a college degree, but only 13 (38.2%) held a master’s degree from a library and information science (lis) program, while another two were enrolled in an lis master’s degree program when the interviews occurred. the majority of participants (57.1%) had five years or more of experience in children’s library services. the participants will be referred to as “storytime providers.” procedure the interviews were conducted by one member of the research team. other members of the team created written transcripts from recordings of the interviews. for the study reported in this paper, researchers focused on participants’ answers to the interview question “what place, if any, does technology or digital media have in a quality storytime program?” an open coding method was used to organize participants’ statements within three categories: purposes underlying technology use, factors associated with technology adoption, and barriers to technology integration. three researchers conducted open coding independently and came up with the initial set of coding results. then, the researchers discussed the coding results multiple times to assess the relevance of the coded constructs, refine operational definitions, and select one representative quote for each code. interviewees were assigned a number between 1 and 34 to eliminate identifying information. information technology and libraries june 2023 technology integration in storytime programs 4 cahill, ingram, and joo results what are the purposes for using technology in storytimes? to find answers to this research question, the researchers coded statements related to how or why interviewees used or wanted to use technology in storytime programs. we identified 10 specific purposes, formed operational definitions for each, and chose one representative quote (table 1). although most purposes had statements from more than one interviewee associated with them, we collaborated to choose one example due to space constraints. researchers determined that the purposes for technology use could be divided into two categories: experiential and learning. experiential purposes are those for which technology is used to create a positive, engaging experience for child and/or adult participants. learning purposes are those for which technology use is intended to help child and/or adult participants learn. what are factors associated with adopting technology in storytimes? to answer the second research question, researchers looked for statements explaining the reasons or causes for storytime providers using or wanting to use technology in their storytime programs. these would be factors that facilitate technology adoption. researchers coded statements independently and then discussed results multiple times to verify relevance and consolidate categories into 15 factors in four dimensions: storytime provider, participant, library system, and content. though many factors had more than one corresponding statement from participants, we chose one representative quote for each. results are presented in table 2. what are barriers to integrating technology in storytimes? to answer this question, researchers independently reviewed responses, looking for statements related to why storytime providers did not or did not wish to use technology during storytime. after individual coding, we collaborated to verify relevance, refine definitions of the 15 identified barriers, and choose representative quotes. the results are presented in table 3. researchers found that three of the dimensions created for factors that lead to techno logy adoption could also be applied to barriers to technology integration: storytime provider, participant, and library system. information technology and libraries june 2023 technology integration in storytime programs 5 cahill, ingram, and joo table 1. purposes for using technology in storytimes category purpose operational definition representative quote experiential accommodating large groups technology is used to enable a large group to view books/materials 2: “i had this huge group of kids. and i took them to our red room and did a story on our big screen. you know, through tumblebooks.” children’s enjoyment provider incorporates media or technology because children enjoy it 14: “and then as far as, um, sometimes, um, we’ll have, like, at the end of a storytime, we may have a little short, um, like nonfiction or sign language or if we were doing something on the alphabet, maybe i would throw in a little dvd and give them popcorn for the end of storytime and things like that and i think that they really enjoy that. it is important to integrate that in.” facilitating adult participation provider uses technology to display the words to songs to facilitate adult participation 12: “the closest thing i would say, i use a powerpoint that has the words on it for the parents to be able to follow along, um, or for the kids if they can pick out some of the letters or start to read, even some of the older ones.” facilitating movements technology is used to facilitate movements or dancing 19: “in addition to our singing, just to give, you know, to change it up a little bit. so, they can hear the music. we clap rhythms. so, we use that a lot.” playing songs/music technology is used to play songs or music 13: “we have a sound system that i love, with surround sound. we always do our last song with, you know, that, and i’ve been fortunate that it’s worked all the time.” information technology and libraries june 2023 technology integration in storytime programs 6 cahill, ingram, and joo category purpose operational definition representative quote sound effects technology is used to create a sound or voice 17: “one of the better things that i’ve done, that i like to do, is, i like to use animal sounds. i’ll research or pull up a list of sounds on youtube or whatever and have the kids listen to them. i think that’s always been a fun way to work in a little bit of technology without taking out all of the flow.” visual aids technology is used to support children’s visual experience 24: “and, like, it gives the kids a visual. and i feel like sometimes, if we could give them a better visual, they might be more engaged.” learning support for adult-child interaction technology is used to support adult-child interaction 1: “if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” teaching caregivers technology is integrated to model for caregivers 11: “i think it’s important to share with parents really good e-resources, such as, like, apps. and books and stuff. so, that, i think it’s very important…. i have, like, when i have like a screen, a projector screen, maybe when the book i picked for the storytime was an e-book that they could get through the library, and kind of, you know, advertise that resource, and then we would, we would read the e-book, you know, from the projector. so i’ve done, like, e-books and stuff.” teaching concepts technology is used to present letters, words, numbers, shapes, sign language, colors, or coding skills, to children 22: “…. all these different color songs, um, and they’re actually just on youtube…. so that is one way that we’ve been incorporating technology, um, is with those color songs because it spells it out for them. they can see the word, it’s a familiar tune, and it helps them, you know, at least be able to sing, sing the song.” information technology and libraries june 2023 technology integration in storytime programs 7 cahill, ingram, and joo table 2. factors associated with adopting technology in storytimes dimension factor operational definition representative quotes storytime provider awareness provider is aware of the tool/technology available for storytime 1: “i’m aware of all kinds of apps that are out there and of course the ebooks.” familiarity provider feels comfortable with the technology and with integrating the technology into programs 1: “i feel like it’s going to be effective if it’s what you’re comfortable with and you’re excited about. because that will come through when you actually provide the storytime.” choice of provider ultimately it is up to the provider to choose to integrate technology or not 1: “i think it all depends on the provider.” provider’s philosophy and approach how the provider views storytime and its purpose influences technology integration 1: “everyone has their own, unique storytime philosophy and the way that they approach planning storytimes…. so, really, a lot of it is just ... theory of how you want to approach it since there’s so many options out there.” reaction/success with initial attempt if the provider tried technology integration, the success or failure of that initial attempt influences subsequent attempts 2: “it went over really well.” information technology and libraries june 2023 technology integration in storytime programs 8 cahill, ingram, and joo dimension factor operational definition representative quotes research base provider is aware of research to support integration of technology 1: “... it’s kind of what the research is saying with parents and digital media at home. it all depends on how you are using it. if you’re actually sitting down with your child, looking at it together, it’s a lot more effective and the child is getting a lot more out of it versus just sitting them in front of it and expecting to teach something to the child.” participant number of participants the number of participants facilitates technology integration 2: “i think this summer was the first time i ever did that [used technology], and it was because i had this huge group of kids.” perception of caregivers’ reactions provider’s perception of how the caregivers would react to technology use 1: “i think they would probably be open to it…. i don’t know if maybe the perception some parents don’t want any technology, that would keep some people from appreciating it. but i think in general, it would be wellreceived if we tried it.” responsive to children’s interests provider uses digital resources because the children show interest or engagement 10: “kids are automatically interested in that stuff. they don’t need to be enticed. you know, you just get out an iphone or an ipad and they’re, like, gasp.” library system access to equipment and resources provider has access to technology and tools 1: “... we have technology, i think, in our system to implement it. you know, e-readers and ipads and things that we can use in storytimes. and large screen tvs.” information technology and libraries june 2023 technology integration in storytime programs 9 cahill, ingram, and joo dimension factor operational definition representative quotes colleague support provider is part of a branch or system that shares information and resources for technology integration 17: “so, you know, we have, and we’ve gotten pretty [good] at sharing with other storytime providers in our system if we have any websites or anything that we’ve been using or music that works really well for ‘movers and shakers’ or anything like that.” expectation to integrate technology in programs provider feels pressure to integrate technology and is defensive about the choice not to do so 1: “i kind of apologize for it…. so, we have the technology available, and they encourage us to use it....” training provider has used or wants to use technology during storytime because of a training 17: “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families. so we’ve been encouraged to add in a little bit more technology into our storytimes if we can do those, you know, in an appropriate way.” content interactivity provider can use technology to facilitate interactivity 24: “... i would love to use some, like, smart tvs, smart boards, those kind of things. just for some interactive songs and you know, activities... when i go into these kindergartens and first grade and second grade rooms, like, these kids are using the smart boards for interactive activities for abcs and colors and shapes and numbers. and it may be through an activity or a song that’s being used with that smart board. and i say, ‘oh, i love that! i wish i could do that!’” theme provider uses technology that clearly connects to the theme of the storytime 17: “actually in my kinderbridge storytime now, it’s shapes month. we have the osmotangrams that i bring out. so that’s one of the ones all four weeks i’m going to use the apps and bring out both of our ipads so that kids can practice those spatial shapes.” information technology and libraries june 2023 technology integration in storytime programs 10 cahill, ingram, and joo table 3. barriers to integrating technology in storytime dimension barrier operational definition representative quote storytime provider fear of difficulties/ problems provider doesn’t plan or hesitates to plan technology use because there may be problems with using it 13: “but technology can be a problem. when you’re planning or something and it’s not working.” previous/ own child’s experiences with tech provider has negative experience using technology with children 5: “i have a four-year old. and it’s interesting to see how he responds to technology and what he responds to. and what helps him to learn the most. and it’s just, like, night and day what he learns from. you know, hearing repeated songs and rhymes and just reading tons of books versus what he learns.… i mean, i think that probably the most he ever learned from an ipad was getting to watch sesame street. just sort of the same, sort of like watching a storytime, i think. but yeah, i think just now from experience seeing like, ‘oh! that really doesn’t. it’s not a helpful tool, i don’t think, for that age.’ just from my experience.” undecided about the value of tech provider is unsure if tech integration is appropriate 5: “i have been all over the board in terms of that subject … like i said, it’s really important for me to pack in as much of what i think they need in a storytime. and i don’t know, again, i’m not sure that i’m doing exactly what is correct and maybe i should be exposing them more. but i feel like, especially for threeto five-year olds, it’s one of those things.... screen time/ overuse concerns provider is concerned about children’s screen time 2: “because i think there’s plenty of opportunity to be had in other places.” information technology and libraries june 2023 technology integration in storytime programs 11 cahill, ingram, and joo dimension barrier operational definition representative quote storytime activities as purposeful alternative to technology provider deliberately chooses not to use technology in storytime because they see storytime activities as equally or more beneficial 16: “and one thing that i’ve gotten feedback on is that kids are exposed to the technology in pretty much every facet of their life, so if we can make this a space where they can learn and experience things in a way that doesn’t have technology and they can see that it’s still really fun and exciting and we can learn a lot, then that has its own place, too.” unwilling to adopt a new technology provider keeps using the prior tool and does not try a new alternative technology 18: “i’m kind of old school because we’ve been using our cd player.” participant children devalue other components of storytime when tech is integrated provider perceives that the children prefer tech over other components of storytime 5: “i used to sometimes show a short video, and then i kind of found that that’s what they looked forward to most. i wanted to sort of change that perception of what the library was for some kids.” difficult to use tech with young children provider experiences difficulty using technology with young children 5: “i have found, for preschoolers, that it is really hard to incorporate anything digital.” lack of access to the internet poor broadband in rural area; why expose children to something they can’t use at home 5: “i feel like, especially here in this rural area, … [w]e have a really poor broadband network here, so not a lot of people have access to the internet. and so sometimes i feel like, also, showing them something that they can’t really utilize at home is not really helpful until they’re a little older also. information technology and libraries june 2023 technology integration in storytime programs 12 cahill, ingram, and joo dimension barrier operational definition representative quote perception or anticipated perception of some parents/ caregivers if the provider perceives that some parents/caregivers will object to tech integration, the storytime provider may be reluctant to do so 1: “i don’t know if maybe the perception, some parents don’t want any technology, that would keep some people from appreciating it.” tech is distracting for young children provider believes technology is distracting 5: “personally, i think i kind of get distracted by the media, so, then i think they would, too. library system lack of access to devices library does not have a certain device or technology even though the provider would like to have or think useful for storytime 24: “um, i’ll be honest with you, if we had the ability, i would love to use some like smart tvs, smart boards, those kind of things.… we just don’t really have that option here.” lack of time to integrate tech into storytime, the provider has to have time to explore tools and know the best resources/media to integrate, and that takes time 1: “and part of it’s time, too. having the time to find quality resources, and to learn how to use them. because we have the technology, i think, in our system to implement it. “ information technology and libraries june 2023 technology integration in storytime programs 13 cahill, ingram, and joo dimension barrier operational definition representative quote lack of training provider thinks self doesn’t have the knowledge, interest, skill, or training to use technology during storytime 15: “and i’d be open to ways to use it, but i guess i haven’t taken, you know, any trainings on … i mean, i really haven’t seen a lot of things offered at conferences.” old facility library does not support installing newer technology 21: “... that’s a thing that we have struggled with previously because of our infrastructure and set-up. it was almost a hazard to set up a projector and have some sort of digital aspect to storytime.” information technology and libraries june 2023 technology integration in storytime programs 14 cahill, ingram, and joo discussion purposes experiential many of the storytime providers’ purposes for using technology revealed a goal to create a positive, engaging experience for all children and adults who attend storytime, a theme that prior research has highlighted.22 specifically, technology facilitates the sharing of visual aids, sound effects, and songs. providers also use technology to encourage adult participation, and like their early childhood educator colleagues, storytime providers in this study reported using technology to scaffold and coordinate children’s gross motor movements with songs and action rhymes.23 learning storytime providers’ responses also show the aim to contribute to the learning of children and adults in storytime. this finding mirrors those of goulding, shuker, and dickie, which found that providers like to use technology in ways that coincide with the aims of children’s services. 24 two of the purposes show an awareness of best practices in technology integration: support for adult-child interaction and teaching caregivers.25 additionally, storytime can be an opportune time for providers to model technology best practices for caregivers as providers have been modeling literacy best practices throughout the history of storytime programming.26 importantly, when storytime providers do model and intentionally seek to support caregivers’ learning, caregivers expand their knowledge, experience heightened confidence, and tend to utilize the strategies they encountered.27 notably, storytime providers tend to feel discomfort with providing instructional or developmental information directly to caregivers via “asides”; thus, a more palatable approach for many storytime providers might include using “we” language along the lines of “when we use digital media, we want to be sure that we are developing healthy habits. some families set a timer to help them monitor the duration of their children’s screentime.”28 one way that storytime providers might model digital media use is to search for and find information related to the storytime theme or book in one of the library’s databases. for example, if a book shared in storytime included a sloth, the storytime provider might demonstrate how to search for a video of a sloth in one of the library’s digital encyclopedias (e.g., encyclopedia britannica). storytime providers should also keep in mind that digital play can be incorporated into the informal activities that typically occur before and after storytime programs as a means to support children’s social interaction with other children.29 for example, if puzzles are typically included as one of the informal activity options before or after the storytime program, the provider might offer both traditional and digital puzzles (e.g., https://kids.nationalgeographic.com/games/puzzles/) on library-owned tablets and provide a simple how-to if needed. supports and barriers through the process of open coding, researchers identified four dimensions that storytime providers’ perceived supports and barriers could fall into based on the primary influential factor: provider, library system, participants, or content. provider the providers’ perceptions about technology and experiences with technology in the library setting serve as facilitators or barriers to integration. if a provider is aware of useful technology, familiar and comfortable with its use, knowledgeable of research supporting technology use, has a https://kids.nationalgeographic.com/games/puzzles/ information technology and libraries june 2023 technology integration in storytime programs 15 cahill, ingram, and joo professional philosophy that can accommodate technology use, and/or has had a positive experience trying out technology, then these may be factors that lead to the adoption of technology in storytime. on the other hand, if the provider has concerns about the difficulties of technology use or the amount of time children spend on screens, if the provider’s professional philosophy views storytime as a deliberate alternative to time with technology, or if the provider has had a negative experience with technology, then these may be factors that prevent the adoption of technology in storytime. these same factors affect early childhood practitioners and influence their decisions to incorporate technology into classroom practices.30 the factors that lead to technology integration could be seen as related to media mentorship. a media mentor has awareness, familiarity, knowledge, and a professional philosophy that supports technology use, all of which were factors identified by interviewees. professional training in mentorship was mentioned by one interviewee (17) who stated, “we did a digital mentoring training about how to appropriately model, like, tech skills and screen time with families.” thus , some providers’ responses indicate some general awareness of the currently emphasized best practice of media mentorship. however, the ambivalence toward the role of media mentor that goulding and colleagues found amongst librarians is also found here as interviewees’ responses do not give a clear picture of how they model technology use for caregivers during storytimes .31 in addition, responses that highlight barriers to technology integration show ways in which some providers are opposed to employing the role of media mentor specifically during storytime. as such, our findings align with prior observational studies that noted “few instances of librarians willing to speak directly to parents about how to interact with their children using technology.”32 participant providers consider the perspectives of the adult and child participants in storytimes in relation to integrating technology. providers are more likely to integrate technology if they view it as an aid to facilitating sessions for large groups, they believe caregivers will be open to the technology, and they appreciate that young children show a high interest in devices such as ipads. however, children’s high interest in devices was seen by other providers as a negative aspect of technology use and a barrier to integration because they thought children were too focused on the technology itself or would be distracted by the technology. just as early childhood teachers have been encouraged to broaden their perspectives of literacy to encompass digital literacy, so too might storytime providers, as this shift in focus would enable them to view these incidences as engagement rather than distraction.33 also, the same interviewee who thought caregivers might be open to technology in storytime expressed the concern that other caregivers might not like its use. our findings related to caregiver reaction echo similar findings from goulding and colleagues: the reaction that providers anticipate from adult participants might be either a support or a barrier for technology integration.34 library system two aspects of the library system were present in both factors and barriers: access and training. when the library system in which the provider worked gave them access to technology and training in its use for programs, they were more likely to integrate technology. in contrast, when a provider did not have access to technology, the library building did not support its use, or training was not given, the provider was less likely to integrate technology. libraries pride themselves on providing the highest level of service to members of the community and “removing barriers to access presented by socioeconomic circumstances.”35 yet, if libraries are to facilitate the digital learning of young children, it is necessary for them to recognize the digital divide impacting information technology and libraries june 2023 technology integration in storytime programs 16 cahill, ingram, and joo children’s access to technology throughout the world, and parents’ reluctance to spend money on digital apps.36 content content was a dimension only found in factors that support technology integration, not in barriers. providers used or wanted to use technology because they could connect the technology to two essential elements in the content of storytime: interactivity and theme. this dimension relates to purposes for technology use in the learning category as providers want to use the interactivity of technology as well as technology directly related to the session’s theme to boost children’s learning. indeed, child learning has long been librarians’ goal in providing storytime programs as has facilitating the development of parent skills.37 conclusion technology is prevalent in the lives of children and many begin interacting with digital tools as early as the first year of life; and caregivers seek guidance regarding their children’s technology use.38 while alsc has championed children’s librarians as media mentors, findings from this study, coupled with those from prior research, highlight storytime providers’ opposition to the media mentor role and the integration of technology within storytime programs.39 some first steps storytime providers might take are to integrate the digital tools the library is already providing. for example, if the library offers e-books (e.g., via libby), the storytime provider might consider integrating one or more picturebooks from that collection into storytime. alternatively, if the library does not have the tools necessary to share the book electronically during the program (e.g., a screen large enough for the storytime group), the provider might read the print version but then follow that up with a comment along the lines of “grownups, did you know that the library also offers this as an e-book that you could read on a phone, tablet, or other device? i would be happy to show you how to access it and other e-books after the program.” providers looking for other ways to incorporate digital tools into library programs might read strategies recommended by librarians in a fully and freely accessible online book.40 as scholars have previously noted, early childhood providers, including those who support young children and families in libraries, need much more professional development.41 specifically, the field needs more opportunities for librarians and other early childhood educators to develop their knowledge and skills within the realm of digital technology for young children, but they also need training that advances the notion of media mentor and boosts their confidence and identities relative to that role.42 the institute of museum and library services recently funded a project designed to support librarians’ knowledge and skills within the realm of family media for children ages five to eleven years—and products from that project are certainly a good starting place for storytime providers; however, additional resources and research focused on library programs and services designed for children from birth through five years are needed.43 if librarians are to embrace the media mentor role with confidence and the necessary knowledge and skills required of the task, much greater attention should be devoted to the responsibility and more support in the way of professional development and resources is necessary. acknowledgement this work was supported by the institute of museum and library services [federal award identification number: lg-96-17-0199-17]. information technology and libraries june 2023 technology integration in storytime programs 17 cahill, ingram, and joo endnotes 1 nalika unantenne, mobile device usage among young kids: a southeast asia study (the asianparent insights, november 2014), https://s3-ap-southeast-1.amazonaws.com/tap-sgmedia/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.p df; brooke auxier, monica anderson, andrew perrin, and erica turner, parenting children in the age of screens (pew research center, 2020), https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-ofscreens/; stephane chaudron, rosanna di gioia, and monica gemo, young children (0–8) and digital technology: a qualitative study across europe (publications office of the european union, 2018), https://doi.org/10.2760/294383; organization for economic cooperation and development, what do we know about children and technology? (2019), https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf; victoria rideout and michael b. robb, the common sense census: media use by kids age zero to eight, 2020 (common sense media, 2020), https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eig ht_census_final_web.pdf; jenny s. radesky et al., “young children’s use of smartphones and tablets,” pediatrics 146, no. 1 (2020). 2 unantenne, mobile device usage; auxier, anderson, perrin, and turner, parenting children; chaudron, di gioia, and gemo, young children (0-8) and digital technology. 3 rideout and robb, the common sense census; sebastian paul suggate and philipp martzog, “preschool screen-media usage predicts mental imagery two years later,” early child development and care (2021): 1–14. 4 auxier, anderson, perrin, and turner, parenting children; suggate and martzog, “preschool screen-media usage.” 5 marc w. hernandez, carrie e. markovitz, elc estrera, and gayle kelly, the uses of technology to support early childhood practice: instruction and assessment. sample product and program tables (administration for children & families, u.s. department of health & human services, 2020), https://www.acf.hhs.gov/media/7970; lisa b. hurwitz and kelly l. schmitt, “can children benefit from early internet exposure? shortand long-term links between internet use, digital skill, and academic performance,” computers & education 146 (2020): 103750; kathy hirsh-pasek et al., “putting education in ‘educational’ apps: lessons from the science of learning,” psychological science in the public interest 16, no. 1 (2015): 3–34. 6 amy koester, ed., young children, new media, and libraries: a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-andlibraries-full-pdf.pdf. 7 association for library service to children, national research agenda for library service to children (ages 0–14), 2019, https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agen da_p rint_version.pdf. https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://s3-ap-southeast-1.amazonaws.com/tap-sg-media/theasianparent+insights+device+usage+a+southeast+asia+study+november+2014.pdf https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://www.pewresearch.org/internet/2020/07/28/parenting-children-in-the-age-of-screens/ https://doi.org/10.2760/294383 https://www.oecd.org/education/ceri/booklet-21st-century-children.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://www.commonsensemedia.org/sites/default/files/uploads/research/2020_zero_to_eight_census_final_web.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://littleelit.files.wordpress.com/2015/06/final-young-children-new-media-and-libraries-full-pdf.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/200327_alsc_research_agenda_print_version.pdf information technology and libraries june 2023 technology integration in storytime programs 18 cahill, ingram, and joo 8 christner, hicks, and koester, “chapter six: new media in storytimes: strategies for using tablets in a program setting.” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 77-88. 9 kathleen campana, j. elizabeth mills, marianne martens, and claudia haines, “where are we now? the evolving use of new media with young children in libraries,” children and libraries 17, no. 4 (2019): 23–32; j. elizabeth mills, emily romeign-stout, cen campbell, and amy koester, “results from the young children, new media, and libraries survey: what did we learn?”, children and libraries 13, no. 2 (2015): 26–32. 10 campana, mills, martens, and haines, “where are we now?”. 11 campana, mills, martens, and haines, “where are we now?”. 12 susan b. neuman, naomi moland, and donna celano, “bringing literacy home: an evaluation of the every child ready to read program” (chicago: association for library service to children and public library association, 2017), 5, http://everychildreadytoread.org/wpcontent/uploads/2017/11/2017-ecrr-report-final. 13 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koester, “results from the young children, new media, and libraries survey.” 14 anne goulding, mary jane shuker, and john dickie, “media mentoring through digital storytimes: the experiences of public libraries in aotearoa new zealand,” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 15 goulding, shuker, and dickie, “media mentoring through digital storytimes”; cen campbell and amy koester, “new media in youth librarianship,” in a. koester, ed., a guide for incorporating new media into library collections, services, and programs for families and children ages 0–5 (little elit, 2015), 8–24. 16 jennifer nelson and keith braafladt, technology and literacy: 21st century library programming for children and teens (chicago: american library association, 2012). 17 c. campbell, c. haines, a. koester, and d. stoltz, media mentorship in libraries serving youth (chicago: association for library service to children, 2015), https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20li braries%20serving%20youth_final_no%20graphics.pdf. 18 association for library service to children, competencies for librarians serving children in libraries. 19 campana, mills, martens, and haines, “where are we now?”; mills, romeign-stout, campbell, and koster, “results from the young children, new media, and libraries survey”; neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final http://everychildreadytoread.org/wp-content/uploads/2017/11/2017-ecrr-report-final https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf https://www.ala.org/alsc/sites/ala.org.alsc/files/content/media%20mentorship%20in%20libraries%20serving%20youth_final_no%20graphics.pdf information technology and libraries june 2023 technology integration in storytime programs 19 cahill, ingram, and joo 20 goulding, shuker, and dickie, “media mentoring through digital storytimes” in proceedings of ifla wlic (2017), https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf. 21 institute of museum and library services, public libraries survey, 2016, https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey. 22 maria cahill, soohyung joo, mary howard, and suzanne walker, “we’ve been offering it for years, but why do they come? the reasons why adults bring young children to public library storytimes,” libri 70, no. 4 (2020), 335–44; peter andrew de vries, “parental perceptions of music in storytelling sessions in a public library,” early childhood education journal 35, no. 5 (2008): 473–78; goulding and crump, “developing inquiring minds.” 23 courtney k. blackwell, ellen wartella, alexis r. lauricella, and michael b. robb, technology in the lives of educators and early childhood programs: trends in access, use, and professional development from 2012 to 2014 (chicago: northwestern school of communication, 2015). 24 campbell and koester, “new media in youth librarianship.” 25 campbell, haines, koester, stoltz, media mentorship in libraries serving youth; prachi e. shah et al., “daily television exposure, parent conversation during shared television viewing and socioeconomic status: associations with curiosity at kindergarten,” plos one 16, no. 10 (2021), e0258572. 26 nelson and braafladt, technology and literacy. 27 roger a. stewart et al., “enhanced storytimes: effects on parent/caregiver knowledge, motivation, and behaviors,” children and libraries 12, no. 2 (2014): 9–14; scott graham and andré gagnon, “a quasi-experimental evaluation of an early literacy program at the regina public library/évaluation quasi-expérimentale d'un programme d'alphabétisation des jeunes enfants à la bibliothèque publique de regina,” canadian journal of information and library science 37, no. 2 (2013): 103–21. 28 maria cahill and erin ingram, “instructional asides in public library storytimes: mixed methods analyses with implications for librarian leadership,” journal of library administration 61, no. 4 (2021): 421–38. 29 leigh disney and gretchen geng, “investigating young children’s social interactions during digital play, early childhood education journal (2021): 1–11. 30 hernandez, markovitz, estrera, and kelly, “the uses of technology”; karen daniels et al., “early years teachers and digital literacies: navigating a kaleidoscope of discourses,” education and information technologies 25, no. 4 (2020): 2415–26. 31 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 32 neuman, moland, and celano, “bringing literacy home,” 58. 33 daniels et al., “early years teachers and digital literacies.” https://library.ifla.org/id/eprint/1742/1/138-goulding-en.pdf https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey information technology and libraries june 2023 technology integration in storytime programs 20 cahill, ingram, and joo 34 goulding, shuker, and dickie, “media mentoring through digital storytimes.” 35 association for library service to children, competencies for librarians serving children in libraries (2020) https://www.ala.org/alsc/edcareeers/alsccorecomps; american library association, code of ethics of the american library association (2021), https://www.ala.org/tools/ethics 36 jenna herdzina and alexis r. lauricella, “media literacy in early childhood report,” child development 101 (2020): 10; sara ayllon et al., digital diversity across europe: policy brief september 2021 (digigen project, 2021), https://www.digigen.eu/news/digital-diversityacross-europe-recommendations-to-ensure-children-across-europe-equally-benefit-fromdigital-technology/. 37 goulding and crump, “developing inquiring minds”; nancy l. kewish, “south euclid’s pilot project for two-year-olds and parents,” school library journal 25, no. 7 (1979): 93–97. 38 auxier, anderson, perrin, and turner, parenting children; rideout and robb, the common sense census. 39 neuman, moland, and celano, “bringing literacy home”; goulding, shuker, and dickie, “media mentoring through digital storytimes.” 40 koester, ed., young children, new media, and libraries. 41 us department of education, office of educational technology, policy brief on early learning and use of technology, 2016, https://tech.ed.gov/files/2016/10/early-learning-tech-policybrief.pdf. 42 herdzina and lauricella, “media literacy in early childhood report.” 43 rebekah willett, june abbas, and denise e. agosto, navigating screens (blog), https://navigatingscreens.wordpress.com. https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://www.digigen.eu/news/digital-diversity-across-europe-recommendations-to-ensure-children-across-europe-equally-benefit-from-digital-technology/ https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://tech.ed.gov/files/2016/10/early-learning-tech-policy-brief.pdf https://navigatingscreens.wordpress.com/ abstract introduction previous studies of technology use in children’s programs and storytimes technology best practices: joint media engagement and media mentorship study objectives method participants procedure results what are the purposes for using technology in storytimes? what are factors associated with adopting technology in storytimes? what are barriers to integrating technology in storytimes? discussion purposes experiential learning supports and barriers provider participant library system content conclusion acknowledgement endnotes microsoft word 12915 20211217 gallery.docx article black, white, and grey the wicked problem of virtual reality in libraries gillian d. ellern and laura cruz information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.12915 gillian d. ellern (ellern@email.wcu.edu) is associate professor and systems librarian, hunter library, western carolina university. laura cruz (lxc601@psu.edu) is associate research professor, schreyer institute for teaching excellence, the pennsylvania state university. © 2021. abstract this study seeks to extend wicked problems analysis within the context of a library’s support for virtual reality (vr) and the related extended reality (xr) emerging technologies. the researchers conducted 11 interviews with 13 librarians, embedded it staff, and/or faculty members who were involved in administering, managing, or planning a virtual reality lab or classroom in a library (or similar unit) in a higher education setting. the qualitative analysis of the interviews identified clusters of challenges, which are categorized as either emergent (but solvable) such as portability and training; complicated (but possible) such as licensing and ethics: and/or wicked (but tameable). the respondents framed their role in supporting the wickedness of vr/xr in three basic ways: library as gateway, library as learning partner, and library as maker. five taming strategies were suggested from this research to help librarians wrestle with these challenges of advocating for a vision of vr/xr on their respective campuses. this research also hints at a larger role for librarians in the research of technology diffusion and what that might mean to their role in higher education in the future. introduction political scientists horst rittel and melvin webber coined the term “wicked problems” in the early 1970s to refer to problems that were sufficiently complex that they defied conventional problemsolving methods.1 initially framed as broad social problems, such as food security or climate change, wicked problems are characterized by having ambiguous parameters, shifting requirements and/or stakeholders, and, perhaps more importantly, “no determinable stopping point.”2 such problems are called wicked because they are “diabolical, in that they resist the usual attempts to resolve them.”3 without the possibility of a clear solution, the end product of wicked problems analysis is not to solve the problem but rather to find ways to “tame” them, an approach which runs counter to conventional models of not only planning but also reasoning.4 if taming is the last step in wicked problems analysis, a critical first step is to determine if a given challenge is, in fact, wicked, as that will then determine what tools, perspectives, and strategies will need to be brought to the table. simple problems can be resolved by matching them to known solutions, more complicated problems may be addressed by analyzing engineering solutions, but super complex/messy/wicked problems require an entirely different mindset.5 persistent frustration with the limitations of conventional problem-solving models has led to a proliferation of studies identifying a host of wicked problems, ranging from the global (covid-19 response) to the local (dysfunctional families).6 the present study seeks to apply the framework of wicked problem analysis to the question of the role of academic libraries in supporting emerging technologies, using the integration of vr/xr as a case study. information technology and libraries december 2021 black, white, and grey | ellern and cruz 2 literature review the wicked problem of libraries and technology has been recognized by a number of scholars, each using a different frame of reference, as perhaps fits the inherent ambiguity of a wicked problem. scholars have identified electronic data management, research data management, and ebooks as library problems that are wicked in nature, and howley notes that the question of public access touches on larger social issues that could be described as wicked. 7 a recent article by williams and willet identifies makerspace technology as boundary work, suggesting that it challenges conventional roles and relationships held by libraries and librarians, an approach which implies the existence a wicked problem.8 despite these exceptions, at least one set of library scholars has noted that “there are very few applications [of wicked problems] in librarianship.”9 the present study seeks to make the case that the application of the wicked problems framework to the question of the role of the libraries in emerging technology can illuminate new strategies, roles, and pathways forward. while research on wicked problems in libraries may be limited, the role of libraries in the curation, development, and dissemination of virtual reality (vr)—or using the more encapsulated term of extended reality (xr)—has been extensively written about by library scholars. although it could be argued that the current output reflects the nascent stages of vr/xr as a research field as scholars explore a library’s role with virtual reality (vr), mixed reality (mr), augmented reality (ar), and everything associated with them such as virtual worlds or 3d 360-degree videos, it is clear that, to date, the published works about vr/xr largely fall into two camps: the visionary and the applied. the former contains studies advocating for the integration of vr/xr (and related technologies) as part of a vision of the future for libraries; and the latter are applied studies that booth labels as “technorealistic.”10 in other words, these are descriptions of established practice or suggestions of practical strategies for how a library (or librarian) can actually implement a vr/xr lab or related program.11 what remains in shorter supply are critical and/or empirical studies that situate the development of vr/xr as an institutional capacity into larger, arguably wicked, questions of the evolving purpose and position of libraries. the case of vr/xr presents a distinctive perspective on the wicked problem of the technological orientation of academic libraries. unlike issues such as electronic records management, vr/xr is not part of the core technological infrastructure of a library, nor does it touch directly upon prior core administrative functions, such as collection development or access services. rather, it is perceived as an extension of library services, particularly those related to the evolving educational mission of the academic library and its role as a broader facilitator of information literacy across disciplines. as one library scholar remarks: as libraries are increasingly called upon to support knowledge exchange beyond traditional books and journals, the creation of novel types of research infrastructure will shape the preservation and access expectations of constituents.12 the present study looks at how librarians navigate, or tame, the myriad of challenges that arise not just from rethinking how an academic library engages with technology, but from pushing the boundaries of what library work is (or could be). as the emergence of vr/xr technology begins to cast a larger shadow over higher education, many librarians have argued that academic libraries associated with institutions with high research activity are especially well situated to take on a leadership role, an opportunity that they information technology and libraries december 2021 black, white, and grey | ellern and cruz 3 had largely missed with recent related technologies such as 3d printing. not wanting to be left behind, these libraries have embraced vr/xr technology at a relatively rapid rate. a recent (unpublished) study noted that in 2015, only 3% (n=4) out of the 125 sampled research universities had a vr/xr presence; by 2020, that percentage increased to 66% (n=77), a rate which appears to be outstripping that of technology competitors such as gis, institutional repositories, and data visualization services.13 given the relatively high resource up-front investment required to support vr/xr, it would appear that many university libraries are doubling down on the prospect that vr/xr will be an integral part of their future. the degree to which the rapid adoption of vr/xr will live up its promise remains to be seen, but the present study seeks to illuminate how current librarians are seeking to tame this potentially savage beast. methodology this irb-approved study is based on the qualitative analysis of eleven interviews with thirteen librarians (8), embedded it staff (3), or faculty members (2), all of whom were involved with the adoption of vr/xr technology at their respective libraries. the inclusion criteria for the study were described in the consent document as those people “currently involved in administering, managing, or planning a virtual reality lab or classroom in a library (or similar unit) in a higher education setting.” to identify potential participants, the researchers conducted a web search using the terms “library” and “virtual reality” or “vr” and then utilized a snowball sampling method to generate a list of potential interviewees that included multiple library types (e.g., academic research libraries [arls], public libraries) as well as institutional types (e.g., community colleges). one large library had multiple participants including one librarian and two support staff responsible for the vr room. taken collectively, these participants’ institutions included community colleges (3), public libraries (2), medical libraries (4), and academic research libraries (4), located in either the united states (10) or canada (1). the pool of the us educational institutions (10) represented five different carnegie classifications: associate’s colleges, doctoral universities, doctoral/professional universities, master’s colleges & universities, and special focus four-year institutions. these comprised a mix of small-, medium-, and large-sized institutions (by full-time enrollment, or fte). all the organizations in this study (11) were public institutions. each interviewee received a copy of the possible interview topics in advance, including a list of potential challenges faced by libraries seeking to integrate vr/xr (see appendix a). the list of challenges was crafted from a literature search, as well as the personal experience of one of the researchers, a librarian who oversees a vr lab. each hour-long, semi-structured interview was conducted via zoom, machine transcribed with kaltura, and further edited manually by the researchers. the transcripts then underwent three rounds of coding. first, the researchers independently reviewed the body of transcripts in their entirety and identified emergent themes. in the second round of coding, potential themes were merged into semi-structured coding guidelines, which were used to code each interview separately. in the third and final round, the themes were re-evaluated and adjusted based on feedback from the previous rounds, leading to the identification of a problem-based typology (emergent, complex, wicked). from our process, we gained insight into a myriad of challenges facing libraries as they work to integrate vr/xr into the work that they do. that insight has, in turn, led to the development of a conceptual framework that we believe will be useful to others seeking to wrestle with these challenges in the future. information technology and libraries december 2021 black, white, and grey | ellern and cruz 4 table 1. equipment, staffing, and funding for vr/xr spaces in participating libraries location of vr service in library number of pcs connected number of mobile headsets types of pc headsets types of mobile headsets staffing one time funding continuing funding room 2 10 htc (vive and pro) oculus go, spectra vr 2 staff yes no entrance 1 0 oculus rift sv 2 staff yes as needed room 4 8 htc vive pro oculus quest and microsoft hololens 1 staff, 3 students yes no, but planned room 5 1 oculus rift s and htc vive pro oculus quest 3 staff, 3 students yes as needed room, mobile vr 1 3 in circ, several in office htc vive cosmos oculus quest, oculus go, samsung odyssey, lenovo mirage solo, hololens, playstation vr and google cardboard 2 staff, 8 students no as needed mobile vr, entrance 2 6 htc vive, oculus rift oculus go, oculus quest all circ staff (2/3 per shift) yes no room 3 7 oculus rift, htc vive pro, htc vive standard google cardboard, insignia vr viewers 2 staff yes no, but planned mobile vr, entrance 4 30 htc vive, oculus rift oculus quest, google cardboard or plastic viewers circ staff at each of 4 locations no yes information technology and libraries december 2021 black, white, and grey | ellern and cruz 5 results library vr/xr spaces even within the relatively small sample of institutions included in our sample, we found that there was a fairly wide range of practice regarding vr/xr library labs, with considerable variance on location, number, and manufacturer of headsets, staffing, and funding as seen in table 1. challenges through our coding process, we identified clusters of those challenges, which we categorized as either emergent (but solvable), complicated (but possible), and/or wicked (but tameable). emergent challenges our respondents identified a number of challenges that are frequently associated with the adoption of emergent technology, regardless of who is choosing to adopt it or what they are choosing to adopt. in other words, any person or place adopting xr (or other types of emergent technology) at this stage of its development is likely to run into similar issues. portability and mobility portability (or lack thereof) was frequently referenced as a limitation of the current technology. the most common headsets purchased for the first generation of library vr lab spaces have physical cords and sensors that have to be plugged in (to high performing computers) during use. one intrepid librarian even described carting around her bulky alienware desktop computers and video displays between campuses, but needing to find a better way because, “it made the computer folks very angry because it’s so delicate and our sidewalks are so bumpy.” she now uses an alienware laptop and some sturdy tripods (for the base stations) on these trips. the lack of portability not only limited the ability of libraries to take vr/xr out of the library for events and in-class presentations, but it also exacerbated existing space constraints, with users having to be literally tethered to cpus, screens, and base stations. as one of our respondents put it, “the biggest issue is that it’s in one place and it’s stuck there.” in this case, manufacturers are aware of the limitations on mobility, and it appears as though wireless headsets will be the next wave of adaptation by the industry. several wireless headsets have already come and gone, as vendors continue to work to overcome both technological and human-centered challenges. the oculus go, google cardboard, and google daydream have all been brought to the market and subsequently been discontinued.14 only one of the libraries we spoke to indicated that they had purchased a wireless headset, and that headset (the oculus go) turned out to be of limited utility. while this next generation of headsets will likely solve a number of operability issues, it also has the potential to compound another challenge noted by most of our respondents, i.e., a lack of sustainable funding for equipment refresh. the majority of our respondents (6 of 8, or 75%) indicated that they purchased their equipment through one-time funding sources, whether internal or external grants (n=6), end-of-year funds, or some combination of these. vr/xr training vr/xr experience remains new to most people outside of the gaming world, so it has fallen largely on librarians to develop introductory training protocols at the level of access to the technology. there are distinctive challenges in introducing vr/xr to a broader audience. some of those challenges may be physical. in its earlier stages, a number of users experienced symptoms such as nausea or seasickness, and while these have been lessened with higher refresh rates and movable information technology and libraries december 2021 black, white, and grey | ellern and cruz 6 lenses, other virtual reality induced symptoms and effects (vrise) continue to emerge with studies of longer-term use.15 two of our librarians expressed concerns that other long-term effects may still be unknown, and both of the public libraries included in this study banned vr/xr use for patrons under 14 years of age until more is known about how it affects developing brains, a recommendation that is now supported by most vendors as well. even for those who do not suffer from physical symptoms, the technology can be disorienting and uncomfortable. this contributes to higher levels of anxiety, which, somewhat ironically, vr/xr has been shown to alleviate in some clinical trials.16 for these reasons, vr/xr labs require staffing not just to safeguard the equipment and ensure its appropriate use, but also to coach users through their new experience. as one of our respondents described her experience, “a lot of people will put on the vr headset and not move because they’re used to computer displays being two-dimensional … it is not common knowledge yet that you can move around and this environment [moves] with you. and they [new users] will just stand there.” coaching someone to move around in a virtual reality environment is not a straightforward endeavor either, as one of our librarians relates: “how do you interact with somebody who can’t see you in a way that’s respectful? because that can be kind of disconcerting if you’ve got a headset on and all of sudden somebody touches your hand?” one of our respondents drew upon her experience as a swimming coach to develop a set of “non-touching” verbal protocols for her student lab assistants to utilize in working with clients who are new to the interior mobility of virtual worlds. other challenges identified by our respondents that might fit into the emerging technologies category include the following: liability, aspects of licensing, physical space modifications, room and equipment management, training curriculum, logistics of engaging with multiple users, availability of apps/games, equipment installation, and evaluation procedures. this list could perhaps also include the need to not only educate patrons on what the emerging technology can do but to advocate for its future significance. as one of our respondents stated, “i think you can write about it and speak about as much as you want. it’s a matter of getting them in there.” complicated challenges unlike emergent issues, complicated challenges are unlikely to be resolved without concerted intervention and leadership and, even then, it is possible that a single or clear solution may not be readily identified. challenges that fall into this category may be described as grey areas, in which future directions remain scattered, unclear, or uncertain. embracing these complexities means that libraries looking to adopt vr/xr currently must be willing to venture out on their own, embracing both the opportunities and the risks inherent in forecasting future technology use. licensing an example of one of these complicated challenges that emerged from our interviews is the issue of licensing. many vr/xr titles are available for free, through services such as steam and the oculus store. all of our respondents indicated that they acquired content via these services. other popular vr/xr academic titles, such as 3d organon anatomy and google tilt brush, are licensed and potential users must pay a fee to access the full functionality of the tool. the challenge is that these licenses are most offered on an individual basis (“for home use only”), a reflection of the primary customer base for vr/xr content creators, e.g., gamers. a number of distributors do offer institutional licenses, but these are primarily for use in companies, with a relatively stable and readily identifiable list of employee users or limited number of stations. some vr/xr distributors information technology and libraries december 2021 black, white, and grey | ellern and cruz 7 offer a lesser-known (and less used) license known as an arcade license (e.g., steam pc café), but the prices are determined based on the assumption that the person renting the software for use will be receiving a fee, an assumption which does not work for libraries who do provide arcadelike services but do so free of charge. in other words, none of these available license types are well-suited for library use; the former too limited, the latter too expensive. as one of our librarians suggested, this is the “sort of the crack that libraries fall into a lot of the time anyway, with regard to [issues such as] document delivery, right, [in which the rule is stated], but it probably doesn’t apply to us in the same way because we’re a library. but it doesn’t explicitly say what i need to do about it.” what this means is that the majority of the librarians we spoke with indicated that they adopted one or more of these license types, but there was discomfort with the maladaptation to library practice and uncertainly as to what might constitute a best practice in the current market space. in the case of vr/xr, this state of affairs is likely due, at least in part, to a lack of awareness of or concern for libraries (or educational labs) as customers on the part of vendors. our respondents indicated that this oversight may be changing, however, as four of the librarians we interviewed reported that game developers reached out to them and negotiated deals in which libraries would receive equipment in exchange for beta-testing new titles with student populations. that said, awareness does not equate to priority, as one of our respondents noted, “i am concerned that we will be one of the last audiences that get some consideration in terms of the functionality that meets the library’s needs.” even if these issues are resolved in the context of vr/xr specifically, it seems unlikely that the complicated problem of “library as customer” will persist with the advent of new technologies and new technology providers. ethics the challenge of vendor relationships is compounded by other emergent ethical issues surrounding the integration of vr/xr into the library. several of the ethical concerns raised by our interviewees are connected to broader social concerns with technology use, such as issues of privacy and security, and others are related to long-standing ethical debates within libraries, such as the degree to which content should be limited by the library. our interviewees had divided opinions, for example, on whether or not the vr/xr lab should offer games. on one hand, the availability of games brought students to the library and engaged them with the new technology. on the other hand, the provision of games constitutes, for some stakeholders, a potentially significant shift away from an academic or scholarly mission for the library. as one respondent put it, “i can’t say that libraries have traditionally not been a place for people to have fun, but i think that’s something that … rubs some people a little bit the wrong way.” another stated, “my big concern at the beginning was that we would put this in and people would [say] … that’s for video games. why did the library buy video games?” the question of including popular content should be a familiar debate to librarians, but the issue is ratcheted up a notch when engagement may also include actions, such as shooting, that may be especially sensitive for college campuses. as one interviewee reflected, “we are a university in the south. and if you had a bunch of white male students that love to go play this game, is that going to make somebody from another group feel uncomfortable or unwelcome or feel like this is not a space for them?” as this example implies, unlike the often private act of reading, vr/xr experiences often take place in virtual places that are at least quasi-public, a venue for which few ethical precedents exist (yet). conversations on the legal and ethical implications of fully virtual information technology and libraries december 2021 black, white, and grey | ellern and cruz 8 crimes, such as rape or robbery, for example, constitute a lively, but so far unresolved, scholarly conversation.17 wicked problems where the challenges faced by libraries get most complicated, however, is when the integration of vr/xr touch upon the more fundamental question of the appropriate roles for libraries in the digital age. our respondents framed their vr labs and services largely within existing roles, e.g., gateway or learning partner, with some attention to emerging roles, such as maker, but they also acknowledged that this adaptation was awkward, solutions were (often) makeshift, and anomalies persist. this suggests the potential for paradigm shifts in the role(s) libraries can play in shaping the intersections of knowledge between the “real” and virtual worlds. library as gateway a number of our respondents connected the library’s adoption of vr/xr technology to its role in providing access to technology for those who may not otherwise have it. this role was especially pronounced in the case of academic libraries located in public universities and public libraries serving a defined community. as one of the respondents described their role, “we’re pleased to have them come and learn how to use these technologies because they’re new and we’re trying to make it more democratized that students can come and use it. they don’t have to pay for it. they don’t have to worry about like a lab being locked away from them. they can come in anytime there’s a staff member and use the stuff where here it will provide them tutorials and instruction if they want to use it.” similarly, another respondent stated, “libraries … offer an entry-level kind of way to engage with this technology in a free way where anyone who is even remotely curious, even if it doesn’t have anything to do with … anything academic, can engage with this stuff.” a third respondent stated that the case they made internally (to their library colleagues) was “to explain the importance of the library philosophy of having equitable access to resources … books are a resource, but technology is also a resource.” we have characterized this role as a gateway, rather than strictly as an access issue, because it also encompasses a vision of a pathway, one which starts in the library but may continue to other places, whether specialized labs in the discipline, in the workforce, or as part of their everyday lives. as one of our respondents put it, “we’re very much about these technologies. they’re here; they’re coming; they’re going to be a big thing soon. and we want our students to know what they are and be comfortable with them. so, we try to position ourselves as a place where they can start learning.” this gateway function is, however, characterized by competing stakeholders, both inside and outside of the library—a defining characteristic of wicked problems. this latter is perhaps best illustrated by looking at issues of accessibility. as the statements above attest, librarians see one of their primary service roles as providing access to technologies such as vr/xr to people who might not otherwise have it. that same sentiment, though, can be flipped on its head when taking other aspects of accessibility into consideration. most vr/xr programs are not ada compliant, whether they are being offered in the physical or virtual public spaces of the library. in its current form, vr/xr is an inherently visual technology, so those who are visually impaired cannot utilize it to the same extent as others. most vr/xr programs require physical movements that may not be possible for those with limited mobility. our librarians have created a few hacks, or workarounds, to provide short-term accommodations for individual students (e.g., a verbal narration of visual interactions), but generally speaking, the technology is not fully accessible. information technology and libraries december 2021 black, white, and grey | ellern and cruz 9 library as learning partner several of our respondents indicated that they saw the library’s adoption of vr/xr technology as an extension of their role as partners in the learning enterprise. this role could be conceived directly, in that the librarian mediates between classroom needs and available vr/xr titles and capabilities. this form of direct mediation could be responsive, i.e., identifying options in response to requests received, or proactive, i.e., identifying options than reaching out to faculty who might wish to avail themselves of them. integrating vr/xr material into the library’s ils the role is especially critical at this stage of vr/xr development, as none of the libraries we spoke to had integrated the available titles into their online, public-facing catalogs or integrated library system (ils). in other words, if a patron wants to know what titles are available, the best way to find out would be to ask the librarian directly and/or visit the vr/xr lab in person. as one librarian put it, “there’s not the infrastructure or the architecture we have around a book. if you were, say, a student in a history class and you wanted to study this thing, there’s no way to discover that as part of the more general resources of the library.” several of our respondents were developing workarounds, such as libguides and web-based directories, but none of these would be accessible through a general search of the library catalog or citation databases. determining how to catalog and/or curate vr/xr artifacts may be challenging and timeconsuming, but it is a problem that has an eventual solution. what is less clear, however, is what the long-term role of the library may be beyond this cataloging function. our respondents consistently indicated that this remains one of the lesser-developed roles for vr/xr in the library, and many identified raising faculty awareness especially as a high priority. while several identified this as essentially a “marketing problem,” it would appear that the challenge extends more deeply. many librarians do not have additional degrees in either educational development or instructional design, which encompasses the practice of matching learning outcomes to technology tools. the two most successful examples of matching learning outcomes to librarybased vr/xr that we heard of were faculty driven, one a project to scan actual human body parts for use in a vr setting; the other a criminal justice project related to empathy education using virtual encounters. these kinds of alignment activities can only occur if there is a tool available to match the proposed learning outcome. most of our respondents lamented the limited availability of titles that are appropriate for use in academic settings, so even if awareness was raised, there may not be sufficient content to meet academic needs. as one librarian suggests, “students will say, i’ve seen the anatomy tool, but right now i’m taking chemistry or i’m taking genetics. do you have anything that will help me with that? i’m a visual learner. i really liked this format. and that’s been challenging for because it’s so new. there’s not a coherence in terms of the titles and subject areas that you get.” and another characterized the issue this way, “it’s like the bargain video bin at walmart. sometimes you have to dig through to find something because it’s just, it’s so new right now.” the issue of availability may seem like an emergent technology issue (as above), but the challenge is further compounded by limitations on capacity, as most library vr labs can only hold one class at a time, and even then, the numbers may be limited, necessitating workarounds such as rotations, remote screen-casting, or extended office hours. even with multiple headsets, most of the time students cannot be in the same virtual reality space together. despite these challenges, information technology and libraries december 2021 black, white, and grey | ellern and cruz 10 many of our respondents were focused on optimizing current capacities, at least in part because of pressure to justify the continued expenditure of both personnel time and equipment costs. this precarious state of affairs is reflective both of tightening university budgets as well as the frequent present of internal sources of resistance from more traditionally-minded colleagues within the library itself (noted tactfully by three of our respondents). bearing all of these factors in mind, it would seem that the question of the long-term sustainability and scalability of vr/xr as a learning service for libraries remains unresolved. library as maker there may be another way to frame vr/xr in the context of libraries. in several cases (n=3), our respondents framed vr/xr not as an extension of classroom-focused service, but rather of support for the research enterprise. as one of our respondents described it, “if they’re still working on a project and they need a thing for this academic project. and then we’re just providing a new way to provide that service, closing some of the research cycle loop, that we’re now part of a different part of that same loop of creating things.” this is a reflection of the changing nature of outputs from scholarly research. previously confined largely to print artifacts, e.g., peer-reviewed journals, researchers are facing an increasing number of choices when it comes to ways to represent the scholarship being created, e.g., knowledge artifacts. this can include artifacts created in, through, or with vr/xr. several of the librarians (n=4) we spoke to mentioned that their vr/xr lab came packaged, in a sense, along with their 3d printing stations. in each case, the librarians noted that the utility of the 3d printers had resonated more readily with library users, and two indicated that they had aspirations to link the two processes in an effort to boost interest in the vr/xr space. for example, one respondent indicated that they wanted users to be able to create an object in a vr/xr program, such as google tilt brush, and then print their creation on an associated 3d printer. libraries have long provided non-3d printing services, largely as ancillary services to support researchers, so this example may, at first glance, appear to be simply a slightly more hightech version of a pre-existing service. these made objects, too, could potentially be stored, cataloged, and disseminated through the library system and/or via a dedicated database such as sketchfab.com. in our interviews, however, the respondents hinted that this linkage (between vr/xr and 3d printing) may actually be a first step towards a more fundamental shift in re-imagining the role of the library vis-à-vis technology. rather than functioning primarily as service providers, emerging technology librarians have the opportunity to become more active (co-)creators of content and facilitators of change. in one case, the vr/xr lab director, also a faculty member, developed partnerships with strategic programs on campus, such as the office of admissions, to generate original content that was specific to their institution. fortunately, the faculty member was able to draw on coding skills she had gained in prior professional roles. in another case, the library partnered with an external developer to generate original content with direct relevance to the community—a project that served to generate interest in the library, vr/xr, and local issues, all at the same time. there is a fundamental difference between a library hosting a maker space and becoming a maker itself. while librarians have traditionally characterized themselves as facilitators of knowledge rather than knowledge creators, there is some evidence that this shift may not be quite as profound as it might appear. this shift began with libraries and librarians scanning digitized items information technology and libraries december 2021 black, white, and grey | ellern and cruz 11 of their siloed special collections and archives. the resulting databases are often treated as published works in and of themselves with the library acting as curator and publisher. in addition, librarians currently hold faculty rank at many research universities and actively present and publish both in library-focused journals, thematic journals (e.g., information literacy), as well as in other venues, often alongside faculty partners.18 the embedded curricular model places librarians in the role of learning designers and as creators of extended, discipline-specific content. it should be noted, too, that content development is not the only “creator” role available. when you build a knowledge management system (like a library catalog), the choices you make serve not just to organize knowledge, but also, to shape that knowledge and, yes, create physical and cognitive pathways to and through it.19 it is perhaps not a coincidence that identifying pathways has been identified as a signature taming strategy for wicked problems. discussion: taming wicked problems our study frames the adoption of vr/xr technology by academic libraries as embedded in the larger wicked problem of library reinvention in a digital age. that said, one of the fundamental characteristics of a wicked problem is not that it is very difficult to solve, but that it is intrinsically unsolvable (or nearly so). this may explain why the question of libraries and technology seems to be a conversation that never goes away, as the question involves a perpetually moving target, embedded in the ever-shifting social, economic, and political dynamics that are taking place well beyond the walls of any library.20 this characterization does not mean, however, that we should not keep trying a variety of strategies to untangle these wicked knots. taming strategy 1: embracing wickedness in a recent essay about learning in higher education, randy bass characterized the wicked problem designation as potentially liberating, rather than discouraging. embracing wickedness serves to move the conversation from thinking of libraries as broken or backward (and therefore, in need of solutions), to a view of the question as a grand challenge, a continual thought experiment that requires ongoing inquiry, thoughtful consideration, and an expansive, rather than reductive, perspective.21 as a grand challenge, the question of libraries and emerging technologies such as vr/xr becomes less of a mad scramble to maintain relevance and more of a scholarly conversation that enhances the role of the library as an inclusive and pluralistic space. in this framework, the questions of whether or not a library should embrace new technology or technology-related service are not bounded by the intrinsic qualities of that technology itself, nor does it mean that libraries everywhere will need to land upon the same, or even similar, technologies, but rather they might seek convergence in the role of libraries as tamers of these wicked problems. taming strategy 2: integrating adaptability the librarians we spoke to generally described their units as falling under the category of “early majority” in roger’s well-known diffusion of technology model, in that they wanted to see evidence that vr/ xr will be useful to others before committing their resources, but they also want to serve a gateway role in introducing promising new technologies to their patrons.22 much of the research on technology diffusion, however, has focused on either end of the curve, i.e., the innovators or non-innovators, and comparatively less research has been done on the role played by those in the middle, such as these libraries.23 by positioning themselves as early majority adopters, academic libraries would potentially be able to articulate a clear and distinct role for themselves vis-à-vis other units within the university that support technology-enabled learning; information technology and libraries december 2021 black, white, and grey | ellern and cruz 12 while also giving themselves the ability to leverage more resources outside of the library itself. the model also has the advantage of providing a sustainable model of re-invention. as a given technology matures along the continuum, the library’s role recedes, enabling it to embrace the next emerging technology. as one of our respondents pointed out, their library used to give training on how to use a mouse and, one day, gateway training for vr is likely to go the same route. taming strategy 3: building networks because wicked problems are complex and ill-defined, taming them is often done by connecting to others with different perspectives.24 our respondents were largely emerging technology librarians who used a number of on-the-ground strategies to tame the wickedness of the task of advocating for a vision of vr/xr on their respective campuses. most of these strategies required creating relationships beyond the walls of the library, e.g., building organizational networks, connecting to community organizations, developing joint, shared, or embedded positions; cultivating faculty champions in academic units, and initiating shared programming. these collaborative strategies resonate with another characteristic of wicked problems, e.g., that they require the ability to think across conventional organizational and disciplinary siloes. taming strategy 4: exercising interdisciplinary imagination and what other role at a university has more experience with this kind of intellectual dexterity than a librarian? our respondents mentioned working with faculty from 14 different disciplines in the context of their responses to our interview questions, and that’s without being asked. as higher education increasingly shifts its attention towards addressing wicked problems, then librarians may be well poised to serve a gateway role in modeling, supporting, and conducting what is now being called “convergent” research.25 this has been described as transdisciplinary inquiry that integrates knowledge from multiple data sources, disciplinary perspectives, and lived experiences in order to confront the world’s most complex problems.26 taming strategy 5: modeling as learning partners in the future of higher education, librarians will have a role to play in developing our students’ abilities to tame these same wicked problems.27 this partnership is not limited to the kind of information and digital literacy needed for cross-disciplinary research. taming wicked problems requires more than a specific set of knowledge or skills, but rather a certain disposition, e.g., a willingness to engage in answering seemingly impossible questions; the flexibility to find pathways through those challenges; the ability to persevere through short-term setbacks; and, above all else, the motivation to support the ability of others to flourish.28 this same set of wicked qualities could easily be applied to all of the respondents in our study, each of whom have succeeded because of their deeply held, intrinsic passion for (and commitment to) the possibilities for what technology and libraries can do together. conclusion the library remains a model of not just individual, but also organizational resiliency. as new technologies such as vr/xr arise, the library as an institution will find ways to weather emerging challenges, resolve complicated problems, and disentangle super complex, i.e., wicked, dilemmas, each of which requires the cultivation of distinctive knowledge, skills, and dispositions. in this study, we argue that the strategies associated with wicked problem solving can serve to strengthen the ability of libraries (and librarians) to serve an active role in our collective future, whether that future is “real” or virtual (or both). information technology and libraries december 2021 black, white, and grey | ellern and cruz 13 appendix a – email to interviewees subject: we are interested in your experience with virtual reality at your library/university/college a research study interview request hi invitee, you are being invited to participate in a research study of how universities navigate the integration of virtual reality labs. you were selected as a possible participant because of your experience in managing or implementing such labs. your participation entails a 45–60 minute interview, conducted through zoom. we will be especially interested in how you, your library, or your university navigated one of the following “grey areas” where a situation is ill-defined or not readily conforming to a category or an existing set of rules or policies. these include but not limited to your professional perspective in one or more of the following: 1. physical and software liability 2. licensing and infringement 3. user accounts with the university and/or with the vendor 4. physical space modifications needed for vr 5. room and equipment management 6. separating collection development policies from equipment and use policies 7. use policies for the equipment, software, and users 8. controlling the vr equipment and software 9. time, research, and staff needed to run this service 10. training and learning curve for users (both faculty and students) 11. logistics of using the vr room for a class and within a class 12. integrating vr into a college course 13. selecting appropriate vr items to purchase 14. evaluating vr items 15. paying for vr items including approval, licensing, purchasing processes 16. installing and maintaining vr items including regular updates, the user installing software/games, management of hardware/software, repair, etc. 17. budget for vr (amount, repair, one-time/continuing) 18. a vr topic of your choice of course, we will not be able to cover all of these areas listed above during our short interview with you. we are sending them so you can begin thinking about these vr challenges and prioritize them. based on our own experience, we think you have important insight to share about some of them that will be beneficial to the broader university and library communities. information technology and libraries december 2021 black, white, and grey | ellern and cruz 14 appendix b – interview protocol 1. tell us about the history of you/your library with vr. 2. how have you/your library navigated one of the following grey areas (drawn from working with vr in libraries) where a situation is ill-defined or not readily conforming to a category or an existing set of rules or policies? these include but are not limited to your professional perspective in one or more of the following (from the list we sent you in our invitation email): • physical and software liability • licensing and infringement • user accounts with the university and/or with the vendor • physical space modifications needed for vr • room and equipment management • separating collection development policies from equipment and use policies • use policies for the equipment, software, and users • controlling the vr equipment and software • time, research, and staff needed to run this service • training and learning curve for users (both faculty and students) • logistics of using the vr room for a class and within a class • integrating vr into a college course • selecting appropriate vr items to purchase • evaluating vr items • paying for vr items including approval, licensing, purchasing processes • installing and maintaining vr items including regular updates, the user installing software/games, management of hardware/software, repair, etc. • budget for vr (amount, repair, one-time/continuing) • a vr topic of your choice please describe an occasion where you were faced with one of these complex, challenging, and/or potentially insurmountable obstacles in integrating vr into your library (or university more broadly). how did you navigate this challenge? 3. please describe one way in which the values, practices, and ethos of librarianship may have been challenged by the integration of a vr lab and the purchase and curation of vr artifacts. . information technology and libraries december 2021 black, white, and grey | ellern and cruz 15 endnotes 1 horst w. j. rittel and melvin m. webber, “dilemmas in a general theory of planning,” policy sciences 4, no. 2 (1973): 155–69. 2 cameron tonkinwise, “design for transitions—from and to what?” design philosophy papers 13, no. 1 (may 2015): 15, http://dx.doi.org/10.1080/14487136.2015.1085686. 3 valerie a. brown, john harris, and jacqueline russell, tackling wicked problems: through the transdisciplinary imagination (london: taylor & francis group, 2010): 302, ebook central. 4 bayard l. catron, “on taming wicked problems,” dialogue 3, no. 3 (1981): 13–16; luke houghton, “engaging alternative cognitive pathways for taming wicked problems,” emergence : complexity and organization 17, no. 1 (2015), https://www.researchgate.net/publication/282282336_engaging_alternative_cognitive_path ways_for_taming_wicked_problems_a_case_study. 5 catron, “on taming wicked problems”; falk daviter, “coping, taming or solving: alternative approaches to the governance of wicked problems,” policy studies 38, no. 6 (november 2017): 571–88, https://doi.org/10.1080/01442872.2017.1384543; david j. snowden and mary e. boone, “a leader’s framework for decision making,” harvard business review (november 1, 2007), https://hbr.org/2007/11/a-leaders-framework-for-decision-making. 6 natallia pashkevich, “wicked problems: background and current state,” philosophia reformata 85, no. 2 (november 4, 2020): 119–24, https://doi.org/10.1163/23528230-8502a008. 7 andrew m. cox, mary anne kennan, liz lyon, and stephen pinfield, “developments in research data management in academic libraries: towards an understanding of research data service maturity,” journal of the association for information science and technology 68, no. 9 (2017): 2182–2200, https://doi.org/10.1002/asi.23781; julie mcleod and sue childs, “a strategic approach to making sense of the ‘wicked’ problem of erm,” records management journal 23, no. 2 (2013): 104–35, http://dx.doi.org/10.1108/rmj-04-2013-0009; shelley wilkin and peter g. underwood, “research on e-book usage in academic libraries: ‘tame’ solution or a ‘wicked problem’?” south african journal of libraries & information science 81, no. 2 (july 2015): 11– 18, https://doi.org/10.7553/81-2-1560; brendan howley, “libraries, prosperity’s wicked problems, and the gifting economy," information today 33, no. 6 (july 2016): 14–15, proquest. 8 rachel d. williams and rebekah willett, “makerspaces and boundary work: the role of librarians as educators in public library makerspaces,” journal of librarianship and information science 51, no. 3 (september 2019): 801–13, https://doi.org/10.1177/0961000617742467. 9 cox, pinfield, and smith, “moving a brick building.” 10 matt cook et al., “challenges and strategies for educational virtual reality,” information technology and libraries 38, no. 4 (december 16, 2019): 25–48, https://doi.org/10.6017/ital.v38i4.11075; kung jin lee, w. e. king, negin dahya, and jin ha lee, “librarian perspectives on the role of virtual reality in public libraries,” proceedings of the association for information science and technology 57, no. 1 (2020): e254, https://doi.org/10.1002/pra2.254; hannah pope, “virtual and augmented reality in information technology and libraries december 2021 black, white, and grey | ellern and cruz 16 libraries,” library technology reports 54, no. 6 (september 8, 2018): 1–25; felicia ann smith, “‘virtual reality in libraries is common sense,’” library hi tech news 36, no. 6 (august 28, 2019): 10–13, https://doi.org/10.1108/lhtn-06-2019-0040; char booth, “from technolust to technorealism,” public services quarterly 5, no. 2 (june 2009): 139–42, https://doi.org/10.1080/15228950902868504. 11 megan frost, michael goates, sarah cheng, and jed johnston, “virtual reality: a survey of use at an academic library,” information technology and libraries 39, no. 1 (march 2020): 1–12. https://doi.org/10.6017/ital.v39i1.11369; jennifer grayburn, zack lischer-katz, kristina golubiewski-davis, and veronica ikeshoji-orlati, 3d/vr in the academic library: emerging practices and trends (washington, dc: council on library and information resources, 2019), https://eric.ed.gov/?id=ed597662; susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association: jmla 105, no. 4 (october 2017): 407–17, https://doi.org/10.5195/jmla.2017.329; kenneth j. varnum, ed. beyond reality: augmented, virtual, and mixed reality in the library (chicago: american library association, 2019); richard smith and oliver bridle, “using virtual reality to create real world collaborations,” proceedings of the iatul conferences. paper 5 (2018): 10, https://docs.lib.purdue.edu/iatul/2018/collaboration/5/; carl r. grant and stephen rhind-tutt, “is your library ready for the reality of virtual reality? what you need to know and why it belongs in your library,” in o, wind, if winter comes, can spring be far behind? (charleston conference, 2019), https://doi.org/10.5703/1288284317070; dorothy carol ogdon, “hololens and vive pro: virtual reality headsets,” journal of the medical library association: jmla 107, no. 1 (january 2019): 118–21, https://doi.org/10.5195/jmla.2019.602. 12 grayburn et al., 3d/vr in the academic library, 8. 13 douglas bates, “library service study,” unpublished data, june 2, 2020; andrew m. cox, mary anne kennan, liz lyon, and stephen pinfield, “developments in research data management in academic libraries: towards an understanding of research data service maturity,” journal of the association for information science and technology 68, no. 9 (2017): 2182–2200, https://doi.org/10.1002/asi.23781; priti jain, “new trends and future applications/directions of institutional repositories in academic institutions,” library review 60, no. 2 (2011): 125–41, http://dx.doi.org/10.1108/00242531111113078; janice g. norris and elka tenner, “gis in academic business libraries: the future,” journal of business & finance librarianship 6, no. 1 (september 2000): 23, https://doi.org/10.1300/j109v06n01_03. 14 ross rubin, “vendors face the tough reality of affordable vr,” zdnet (july 13, 2020), https://www.zdnet.com/article/vendors-face-the-tough-reality-of-affordable-vr/. 15 sarah sharples, sue cobb, amanda moody, and john r. wilson, “virtual reality induced symptoms and effects (vrise): comparison of head mounted display (hmd), desktop and projection display systems.” displays 29, no. 2 (march 1, 2008): 58–69, https://doi.org/10.1016/j.displa.2007.09.005. 16 emily carl et al., “virtual reality exposure therapy for anxiety and related disorders: a metaanalysis of randomized controlled trials,” journal of anxiety disorders 61 (january 1, 2019): 27–36, https://doi.org/10.1016/j.janxdis.2018.08.003. information technology and libraries december 2021 black, white, and grey | ellern and cruz 17 17 edward castronova, on virtual economies, (rochester, ny: social science research network, july 1, 2002), https://papers.ssrn.com/abstract=338500. 18 barbara i. dewey, “the embedded librarian: strategic campus collaborations,” resource sharing & information networks 17, no. 1/2 (march 2004): 5–17; alessia zanin-yost, “academic collaborations: linking the role of the liaison/embedded librarian to teaching and learning,” college & undergraduate libraries 25, no. 2 (april 2018): 150–63, https://doi.org/10.1080/10691316.2018.1455548. 19 xiaoping sheng and lin sun, “developing knowledge innovation culture of libraries,” library management 28, no. 1/2 (january 9, 2007): 36–52, https://doi.org/10.1108/01435120710723536. 20 lorcan dempsey, “libraries and the informational future: some notes,” information services & use 32, no. 3/4 (july 2012): 201–12, https://doi.org/10.3233/isu-2012-0670. 21 randall bass, “what’s the problem now?” to improve the academy: a journal of educational development 39, no. 1 (spring 2020), https://doi.org/10.3998/tia.17063888.0039.102; kate crowley and brian w. head, “the enduring challenge of ‘wicked problems’: revisiting rittel and webber,” policy sciences 50, no. 4 (december 1, 2017): 539–47, https://doi.org/10.1007/s11077-017-9302-4. 22 brady d. lund, isaiah omame, solomon tijani, and daniel agbaji, “perceptions toward artificial intelligence among academic library employees and alignment with the diffusion of innovations’ adopter categories,” college & research libraries 81, no. 5 (july 2020): 865–82, https://doi.org/10.5860/crl.81.5.865. 23 david a. abrahams, “technology adoption in higher education: a framework for identifying and prioritising issues and barriers to adoption of instructional technology,” journal of applied research in higher education 2, no. 2 (2010): 34–49, https://doi.org/10.1108/17581184201000012. 24 tilmann lindberg, christine noweski, and christoph meinel, “evolving discourses on design thinking: how design cognition inspires meta-disciplinary creative collaboration,” technoetic arts: a journal of speculative research 8, no. 1 (may 2010): 31–37, https://doi.org/10.1386/tear.8.1.31/1; nancy roberts, “wicked problems and network approaches to resolution,” international public management review 1, no. 1 (2000): 1–19. 25 heather leary and samuel severance, “using design-based research to solve wicked problems,” icls 2020 proceedings (june 2020): 1805-6, https://repository.isls.org/bitstream/1/6452/1/1805-1806.pdf; deborah l mulligan and patrick alan danaher, “the wicked problems of researching within the educational margins: some possibilities and problems,” in researching within the educational margins: strategies for communicating and articulating voices, ed. deborah l. mulligan and patrick alan danaher, (cham, switzerland: palgrave macmillan, 2020): 23–39, https://doi.org/10.1007/978-3-03048845-1_2. information technology and libraries december 2021 black, white, and grey | ellern and cruz 18 26 brown, harris, and russell, tackling wicked problems, ebook central; chris burman, marota aphane, and naftali mollel, “the taming wicked problems framework: reflections in the making,” journal for new generation sciences 15 (april 20, 2018): 51–73, https://www.researchgate.net/publication/324646298_the_taming_wicked_problems_fram ework_reflections_in_the_making; “convergence research at nsf,” national science foundation,” accessed october 21, 2021, https://www.nsf.gov/od/oia/convergence/. 27 alex jorgensen and kara lindaman, “practicing democracy on wicked problems through deliberation: essentials for civic learning and student development,” journal of management policy and practice 21, no. 2 (2020): 28–39, https://www.proquest.com/scholarlyjournals/practicing-democracy-on-wicked-problems-through/docview/2435720594/se-2; paul hanstedt, creating wicked students: designing courses for a complex world (sterling, virginia: stylus publishing, 2018), ebook central. 28 ronald barnett, “learning for an unknown future,” higher education research & development 31, no. 1 (february 1, 2012): 65–77, https://doi.org/10.1080/07294360.2012.642841; stephanie wilson and lisa zamberlan, “design for an unknown future: amplified roles for collaboration, new design knowledge, and creativity,” design issues 31, no. 2 (spring 2015): 3–15, https://doi.org/10.1162/desi_a_00318; robin kundis craig, “resilience theory and wicked problems,” vanderbilt law review 73, no. 6 (december 2020): 1733–75, proquest; larry j leifer and martin steinert, “dancing with ambiguity: causality behavior, design thinking, and triple-loop-learning,” information knowledge systems management 10, no. 1–4 (march 2011): 151–73. reproduced with permission of the copyright owner. further reproduction prohibited without permission. prospector: a multivendor, multitype, and multistate western union catalog bush, carmel;garrison, william a;machovec, george;reed, helen i information technology and libraries; jun 2000; 19, 2; proquest pg. 71 prospector: a multivendor, multitype, and multistate western union catalog the prospector project represents a unique union catalog. the origin, goals, and design of the union catalog that uses the inn-reach system are presented. challenges of the union catalog include the integration of records from libraries that do not use the innovative interfaces system and the development of best practices for participating libraries. t he prospector project is a union catalog of sixteen libraries in colorado and wyoming built around the inn-reach software from innovative interfaces, inc. (iii).1 in 1997, the colorado alliance of research libraries (the colorado alliance) and the university of northern colorado submitted a joint grant proposal to create a regional union catalog for many of the major academic and public libraries in the region. the project would allow users to view library holdings and circulation information with a single query of the central database. the union catalog also would allow patrons to request items from any of the participating libraries and have them delivered to a nearby local library. however, unlike many of the other union catalogs in the country, prospector has several unique elements: • it is multistate (colorado and wyoming). • it is multisystem (incorporating systems from innovative interfaces and carl corporation; plans call for voyager from endeavor). • it is multi-library-type (academic, public, and special libraries). regional union catalogs representing the cataloged collections of libraries that are related by geography, subject, or library type have been extant for many years. early leaders in the field spearheaded locally developed systems such as the university of california's melvyl system and the illinois library computer systems organization's (ilcso) illinet online system, which became operational in 1980.2 the commercial integrated library system market began to emerge in the late 1980s and the 1990s with such vendors as innovative interfaces and its work with ohiolink through its inn-reach union catalog product, and the carl system.3 many major vendors now have union catalog solutions for a single physical union catalog, although most have the requirement that participating libraries all use the same integrated library system. an alternative approach that is also becoming popular, because of the heterogeneous nature of the ils marketplace and the widespread implementation of z39.50, is for libraries to create virtual union catalogs through broadcast searching. this solution is available from many ils vendors as well as through organizations such as oclc and its webz software. carmel bush, william a. garrison, george machovec, and helen i. reed there is not a single "right" answer for whether regional catalog searching and document delivery is best accomplished through a physical or virtual union catalog. each solution has benefits and drawbacks that must be balanced against the mix of vendors, economics, politics, and technical issues within a state. prospector is somewhat unusual in that it does create a single physical union catalog but allows for the incorporation of other library systems, made possible through a published specification from innovative interfaces. i prospector history, funding, and project goals colorado has a long history of resource sharing through a variety of programs, including use of the colorado library card statewide borrower's card and access to individual libraries' online catalogs through the access colorado library information network (aclin) and other regional catalogs. the colorado alliance has taken a leadership role within the state in promoting cooperation among major academic and public libraries in the areas of automation, joint acquisitions, and other cooperative endeavors. existing online catalog software enabled patrons to easily search individual online catalogs, but searching several catalogs was a tedious task requiring many steps. it has long been a goal of the alliance to have a true union catalog of holdings for all member libraries. to forward this goal, in 1997 the colorado alliance of research libraries and the university of northern colorado jointly applied for and received a grant from the colorado technology grant and revolving loan program to establish the colorado unified catalog, a unified catalog of holdings for sixteen of the major academic, public, and special libraries in colorado.4 the university of wyoming was included in the project through separate funding. the grant of $640,000 was used to develop a union catalog that would support searching and patron borrowing from a single database. the colorado alliance carmel bush (cbush@manta.library.colostate.edu) is assistant dean for technical services at the colorado state university libraries, fort collins; william a. garrison (garrisow@ spot.colorado.edu) is head of cataloging at the university of colorado at boulder (colo.) libraries; george machovec (gmachove@coalliance.org) is the associate director of the colorado alliance of research libraries, denver; and helen i. reed (hreed@unco.edu) is associate dean, university of northern colorado libraries, greeley. prospector i bush, garrison, machovec, and reed 71 reproduced with permission of the copyright owner. further reproduction prohibited without permission. and the university of northern colorado contributed an additional $189,500 of in-kind services to the unified catalog project. additionally, the colorado alliance contributed $119,000 of in-kind funds to support purchase of distributed system software. the colorado unified catalog project, later named prospector, was based upon the inn-reach software developed by innovative interfaces, inc. it included all innovative interfaces sites in colorado as of december 1996 as well as the carl system sites that were members of the nonprofit colorado alliance of research libraries.s the colorado unified catalog project had two major goals: • the development of a global catalog database containing the library holdings of the largest public and academic libraries in the region; and • the development of an automated borrowing system so that users at any of the participating libraries could easily request materials electronically from any other participating libraries.6 the union catalog would allow users to view library holdings and circulation information on titles with a single query of the global database. once titles were located, patrons could request available items and have them delivered to their home library. the grant proposal identified four major goals and outcomes of the project: access, equity, connections, and content and training. by creating a global catalog, the colorado unified catalog project would provide students, faculty, staff, and patrons free and open access to the union catalog via the internet. patrons from all participating libraries would have equal access to the combined holdings of all sixteen participating libraries, thus greatly enhancing resources available to patrons without the necessity of travel across the state. connectivity was greatly enhanced by the installation of high-speed internet access in the colorado alliance office where the union catalog server was housed. the unified catalog project amassed, in one place, the complete cataloged collections of the major libraries in the region creating a single, easy-to-use public interface. training for the catalog would be conducted in each library so that it could be integrated into the standard training and reference services of each participating library.? addressing statewide goals for libraries, the colorado unified catalog was designed to dovetail with an existing project in colorado called the access colorado library and information network (aclin) in several ways. the goal of aclin was to provide statewide searching of several hundred library catalogs in colorado through broadcast 239.50 searching. however, because of the large number of online library catalogs (too many z39.50 targets cause broadcast searching to be slow) and 72 information technology and libraries i june 2000 poor network infrastructure in some parts of the state, the creation of physical union catalogs, such as prospector, greatly enhanced the ability for a project such as aclin to be successful. as stated in the grant proposal it will: • make aclin more efficient since sixteen libraries will be grouped together and can be accessed via a single search, thus saving alcin users steps in searching; • enhance aclin's document delivery plans since patrons can make requests themselves; • offer both web and character interfaces for various levels of users; • provide access via aclin's dial-in ports as well as via the internet; and • support alcin's future developments based on a 239.50 environment.s work on the development of the colorado unified catalog began in mid-1997. even while contract negotiations were underway in midto late 1997, groups were busy undertaking discussions on the design and structure of the unified catalog. work on development of profiling and system specifications continued through july 1998. this data was entered onto the server at the colorado alliance office and a test database was created in august 1998. testing was completed in november 1998 and the first records were loaded in december 1998. the creation of the database for the first twelve libraries took seven months. during the database load the catalog was available for searching, although most participating libraries did not highlight the system in their local opacs. innovative interfaces, inc. conducted training on the actual patron requesting and circulation functions at three sites over the period from may through july 1999. as of january 2000 the catalog included more than 3.6 million unique bibliographic records of the twelve largest libraries in colorado (more than 6.6 million marc records have been contributed, which has resulted in 3.6 million unique records after de-duplication). with the database in place and opac and circulation training complete, prospector went "live" for patron-initiated requests in the first eight libraries on july 23, 1999. as of december 31, 1999, all twelve innovative sites were "live" in prospector. the final programming for loading the records from carl-system sites will be completed in spring 2000. it is anticipated that carl-system library records will be loaded in late spring 2000 and will bring the database to more than five million unique marc records, with more than ten million item records. since the receipt of the grant, two participating libraries have selected endeavor as their online integrated system . contract negotiations are underway between innovative interfaces and the reproduced with permission of the copyright owner. further reproduction prohibited without permission. colorado alliance to come to an agreement on loading records for the endeavor libraries into prospector. i politics and marketing of prospector planning and policy making are inherently political processes in which participants choose among goals and options in order to make decisions and to direct actions. for prospector the diverse makeup of multitype libraries and multisystems augured for different perspectives on implementation from the onset. nearly every department in member libraries would have an impact from the project. to be successful in carrying out their charges, the work of the task forces appointed to implement prospector had to address how these staff could influence the process and how local practices would be affected. the challenge was to engage staff in the process since the task force structure precluded representation from every member library. meeting this challenge would be vital to ensuring input and fostering buy-in and advocacy for prospector in member institutions. consequently, in addition to reviewing standards or best practices and focusing on the goals stipulated in the grant, obtaining factual knowledge about member practices and resources and encouraging communications served as key ingredients in planning and policy development. general process profiling prospector, a main charge for the cataloging/ reference task force, illustrates the general process employed in planning and how key ingredients were applied to gain input and produce results. the first step involved the task force's review of the grant's aims for the unified catalog. with that framework as a basis, a planning process was outlined and shared with participants. the prospector web site detailed the specification development process, including the schedule and opportunities for input. next the task force surveyed participants for information on their systems: bibliographic indexing rules, types of indexes, characters indexed in phrase indexes, indexes on which authority control performed, and suppliers of authority records. using this data, the task force identified the commonalties and differences to determine what to create in the unified catalog. members also consulted innovative interfaces and reviewed what previous innreach customers had established. draft recommendations for indexing, indexes, record overlay, and record display specifications were then posted on the prospector web site and participants requested to review and provide input. a notice in data/ink: the alliance newsletter (www.coalliance.org/ datalink) also referenced the site. at the same time, testing was performed using draft specifications in order to assess them and to check for other concerns that testing might reveal. because of the importance of the recommendations, an open forum was held to receive additional comments. following the forum, the task force members made final adjustments to the specifications. after the period for public comment ended, the specifications were submitted as recommendations to the prospector steering committee for approval. once approved, the specifications became official and were referenced in all site visits. issues because of the design of inn-reach, participants must make decisions about contribution of records, priorities for what record would serve as the master record, order of loading, indexing, indexes, and displays for the unified catalog. circulation functions require decisions about services for patron types, circulation statuses, loan periods, numbers of loans, renewals, recalls, checkouts, holds, overdues, fines, notices, pick-up locations, and billing. in the case of prospector, expectations regarding what would be controversial met with a few surprises. for example, the master record, the bibliographic record from one participating library to which holdings of other libraries are displayed, is based upon encoding level and the library priority list. the latter determines if the incoming record should replace an existing level; a record with a higher level will replace a lower one. based upon the data collected from libraries, a proposal categorized libraries into the following order: large, special, and "all others." the order was further factored by a member library's application of authority control and participation in program for cooperative cataloging programs. the proposal drew minimal comment from libraries. pride of ownership was not an obstacle. everyone was committed to the fullest authorized form of the record. how many loans an individual could request was the subject of early debate. there were concerns about discrepancies between local limits for borrowing and the possible setting of a higher number of loans on prospector. a corollary concern was that a high number might result in depleting a member library's collection. previous experience with borrowing by a subset of members shed light on the issue; there were no problems with loan limits. in fact, inn-reach supports "load leveling" across participating libraries randomly as well as by prospector i bush, garrison, machovec, and reed 73 reproduced with permission of the copyright owner. further reproduction prohibited without permission. precedence tables thus avoiding systematic checkout from one library only. members decided that they could always pass a request on to another owning library if necessary and monitor loans to determine if any abuses would develop. with these options, it then became possible to establish a forty checkout limit for individual patrons in prospector. differences in cataloging practices engendered more discussion because of the potential for a policy that might affect local practice. in the course of comparing practices of institutions, the cataloging/reference task force identified multiple records for the same serial titles that reflected differences in forms of entry and multiple versions treated either in separate records or on the same record. there was wide variety in statements of holdings. these differences warranted gathering further information on holdings; multiple versions, especially those involving electronic versions; and successive/latest entry for cataloging. the task force decided to hold a focus group on serials and invited staff in member libraries from serials, cataloging, and reference to attend. in the meantime, visits to participating libraries were instituted, the first of the roadshows, to discuss serials practices, their implications for overlays and displays, and options for handling them. the focus group attracted a large attendance and proved useful in gathering information about practices and the concerns of participating libraries regarding serials. most libraries reported individual practices for recording holdings. although participants expressed a desire for consistency, attendees also shared that resources are not available to retroactively change them. instead attendees encouraged development of a best practice recommendation that would follow the niso standards for those libraries wishing to change practices. with the exception of electronic versions of serials, focus group participants had no problem with multiple formats in the same bibliographic record as long as it was clear to users. electronic versions prompted a lot of questions about what to do with 856 links to restricted access resources and about changes in software. it was clear that this issue would need further investigation by the task force. the hottest area, successive or latest entry cataloging of serials, registered strong preferences by proponents. attendees did not welcome changing practice in either direction. instead there were questions asked about possible system changes and about the conduct of use studies to determine what problems might arise from latest entry records in the system. with the information gained from the focus group meeting, the task force assigned priority to the areas and pursued latest/ successive entry as the top priority. 74 information technology and libraries i june 2000 already the task force had consulted innovative interfaces, inc. and received a negative reply to possible changes to matching algorithms, loading programs, and record values that could deal with practices of participants because of the software structure. it was technically impossible for a latest entry and successive entry record to load separately given their match on the oclc number. the predominant use of successive entry and its status as the current national standard persuaded the task force initially to recommend coding latest entry in a special way so that the record for such an entry would not be the master record in the system unless it was unique. this interim measure led to the policy recommendation that successive entry serve as the standard for prospector. as a part of the recommendation, members are asked to not undertake retroactive conversion/ recataloging projects to change existing latest entry records. up to the meeting of the prospector board of directors, the serials policy was argued. the approval by the board illustrates that controversial issues may require that leadership commit their libraries to policies. marketing marketing incorporates an overall strategy of identifying patrons' needs, designing products to meet those needs, implementing the products, and promoting and evaluating them. the twin goals of prospector are: (1) one-stop shopping and expanded access regardless of location, and (2) an automated borrowing system to facilitate fast delivery of materials that addressed problems experienced by patrons in searching and obtaining materials. the grant proposal outlined a plan for member libraries to meet these goals through inn-reach software and the cooperative efforts of participating members. with the implementation of the unified catalog and patron-initiated borrowing, the next pieces of the strategy, promotion and evaluation, come into play. member libraries commitment to a cooperative venture takes time and energy. the support for prospector at the library director and dean level had to be translated to staff in member libraries whose efforts would be necessary to support the unified catalog and patron-initiated loans. staff members had to become acquainted with how prospector would benefit patrons and their work. hence internal promotion was a necessary component throughout planning and policy development and with implementation to users. because of the numbers of staff in member libraries, no one method would assure awareness of developments for prospector. the approach involved the alliance's newsletter (datalink), a prospector web site, electronic reproduced with permission of the copyright owner. further reproduction prohibited without permission. discussion lists, e-mail, correspondence, phone calls, documentation, training sessions, and many site visits. the site visits facilitated interaction across institutional lines and were important for discussing critical issues at the local level. in arranging for site visits, it was important to clarify what the staff members wanted to discuss. a general update on prospector might be followed by other technical sessions such as preparing the library's database for load into the prospector system. participants' questions emphasized the importance of sharing the plan for developing prospector and the basic concepts guiding the implementation planning and policy process as listed below. these concepts bore repeating because a staff member could have been hearing about prospector for the first time. • decisions and directions are guided by data and input gathered from participants, standards/best practices, system capabilities, and the aims for prospector described in the grant. • relatively few local practices are affected by participating in prospector. • inclusiveness in record contributions would build prospector into a rich resource for users; however, participating libraries can exert control over contributions. • global policies are developed for prospector only; local sites define their own local policies. • assistance is available to participating libraries in coming up with solutions for special circumstances. • prospector is not reinventing the wheel. although the multitype library and multisystem involvement would produce a new model of inn-reach, other inn-reach sites could serve as models. • think globally but act locally. more than a catchphrase, this statement acknowledges the reality of individual library circumstances and the balancing of prospector goals to maximize access and use of resources by patrons. patrons the design of the pac, a promotional brochure, and individual library public relations efforts all served to promote prospector's availability to users. prospector provides access via telnet and the web. the impetus, however, was to examine member webpacs and create a prospector webp ac that exemplified the best in menu design including caption descriptions, navigational aids, and consistency in display of elements among search screens. special attention was paid to providing example searches that would have appeal for the diversity of patrons served by the membership. after mulling over several name possibilities, the alliance staff suggested the name prospector for the unified catalog, connoting the rich mining history of the rocky mountain area. this identity found its depiction in a classic picture of a gold miner supplied by the colorado historical society. representing the user, the miner is the center panning for gold, an apt image for users exploring the richness of resources from the unified catalog. the incorporation of the image as the logo on the web site and the catalog was followed by its adoption for the entire cooperative venture. name recognition spread quickly. to facilitate promotion at member libraries, the alliance staff designed a brochure. the design features a brief description of the unified catalog, a list of members and information for patrons on how to connect, what's available on prospector, how to use the self-service borrowing, and how to view their circulation record. many libraries have web-mounted guides or paper handouts in their instructional service, using the alliance-designed brochure as a model. finally, staff in member libraries exercised individual approaches to promote prospector to users. denison library describes and provides a link to prospector on its web list of databases and help guides. colorado state university libraries devoted the front page of its library newsletter to "hunting for hidden gold," the introduction of prospector. a special newsletter for auraria's history faculty highlighted prospector in its database news section. the university libraries of the university of colorado at boulder describes the unified catalog in its web site on its state services page. more introductions came from instructional classes held by every member library. profile of participating libraries prospector is unique since it is multistate, multi type, and multisystem. of the sixteen members (see appendix a), almost all are located along the front range of the rocky mountains extending from laramie, wyoming, southward to colorado springs, colorado. only fort lewis college is located on the western slope of the mountains. despite the distances, a network of courier service connects all members. within the membership are eleven public and private academic libraries, three special libraries representing law and medicine, and two public libraries that serve almost one million registered patrons. twelve of the libraries operate innopac and are loaded into prospector. two libraries on the carl system are slated for loading in mid-2000. two other libraries are migrating to the voyager system by endeavor information systems in the summer of 2000. hopes are to incorporate them into the system in 2001. prospector i bush, garrison, machovec, and reed 75 reproduced with permission of the copyright owner. further reproduction prohibited without permission. description of how inn-reach works the inn-reach software is designed to provide a union catalog with one record per title with all of the libraries holding a title represented. after databases are loaded initially, the software automatically queues transactions that occur to bibliographic, item, order, or summary serial holdings records and sends those transactions up to the central catalog. staff in the local library has no extra work or steps to take to send transactions to the union catalog. the union catalog uses a "master" record to maintain only one bibliographic record per title. the "owner" of the master record is determined by several factors. a bibliographic record with only one holding library automatically has that library as the owner of the master record. if more than one library holds a title, the system uses an algorithm to determine which record coming into the system has the highest encoding level. the library that has the record with the highest encoding level becomes the owner of the record, and its version of the record is displayed and indexed in the catalog. in addition, a table is created which has a list of the libraries in priority order for determining the master record if two or more matching records enter the system with the same encoding level. for the prospector catalog, a survey was conducted of the participating institutions to determine which libraries might have the best or fullest records. questions in the survey included size of database, source of bibliographic records, participation in national projects (e.g., program for cooperative cataloging, oclc enhance), amount of authority work done and level of authority control in the local database, level of cataloging given to records, and type of institution. the task force charged with designing the catalog examined these surveys and determined a priority order of the participating institutions for selecting bibliographic records. the system also uses a set of match points each time a bibliographic record is added to the union catalog. whenever a match occurs, the system examines the encoding level of the incoming record and the library from which the record is coming to determine if a change in the master record is required. the existing record is overlaid by the incoming record if the master record holder is changed. the first check is done on the oclc record number. if there is a match on that, the system adds the holdings to the existing record. if there is no match on the oclc number, the system attempts to match on the isbn or issn in combination with the title in the 245 field. again, if a match occurs, the system adds the holdings to the existing record. if no match occurs, a new bibliographic record is added to the catalog. in addition, each library that has a local innovative interfaces system has the ability to exclude bibliographic, item, order, or check-in records from being sent to the 76 information technology and libraries i june 2000 union catalog. suppression may occur in each of these record types. the library may also choose to send a record to the union catalog but exclude it from public display in the union catalog or to suppress a record from displaying in the public catalog both locally and centrally. the inn-reach system has no central database maintenance module, though it does provide a staff mode in which to view records, to create lists, and to monitor transaction queues. the staff module that is available via a telnet connection allows authorized users to view those records that have been contributed to the union catalog but are not displayed to the public in the union catalog. for example, a library may contribute its order records to the union catalog but choose to suppress those records from public display; however, authorized staff may view these records in the inn-reach staff mode or create lists for collection development purposes that include those order records. circulation status of individual items and volumes also appears to the user. the prospector member libraries with local innovative interfaces systems also maintain a set of circulation or item status codes that display various messages to users of their individual public catalogs. the inn-reach system also has a set of circulation or item status codes. agreement was reached on what the status codes were to be in the central catalog, and each member library then had to map its local codes to the codes used in the central catalog to ensure proper message display in the union catalog. in some cases, the member libraries had to adjust local status codes. indexes for the prospector catalog were determined during the profiling process. in general, there are more indexes in the union catalog than are available in the member libraries' local catalogs. indexes in prospector include author, author/ title, library of congress subject headings, medical subject headings, library of congress children's subject headings, journal title, keyword, library of congress classification numbers, national library of medicine classification numbers, dewey decimal classification numbers, government documents numbers, oclc numbers, and special numbers (e.g., isbn, issn, music publisher numbers, etc.). the classification number indexes are derived using the classification numbers that appear in the defined marc tags for the various classification schemes in the bibliographic record and do not represent local call numbers. local call numbers are always stored at the item record level in the union catalog. it was decided that many local marc fields that are defined for local notes or local access would not transfer from the local catalog to the union catalog (e.g., 59x, 69x, 79x, 9xx) to avoid ambiguities and excessive heading conflicts. therefore, there may be access points or index entries in the local catalog that may not be available in the union catalog; the local reproduced with permission of the copyright owner. further reproduction prohibited without permission. catalog may still contain "richer" or "fuller" searching than the union catalog. the local catalog may have materials accessible in it as well that do not appear in the union catalog. patrons using a local catalog may transfer their searches up to prospector simply by clicking on a button in their local public catalogs and have the search automatically occur in the union catalog. patrons may access prospector directly either via the world wide web or via telnet. navigation between local catalogs and prospector as well as navigation within prospector has been designed to be clear and simple. patrons may also go from prospector either back to their local catalog or to the local catalogs of other member libraries. when a patron locates an item that he or she wishes to borrow from prospector, he or she may initiate the request for the item online. the borrowing and lending process is described below. prospector member libraries have been asked to be as inclusive as possible in contributing bibliographic records to the union catalog. member libraries have been asked to contribute the following: • items that users may borrow, including all monographic materials that circulate, and other material types as specified by individual institutions that are listed as available for circulation. • items that users may not borrow but may use onsite, including reference materials, archival materials, rare books, and others as determined by individual institutions. virtual items, such as electronic journals, which have ip limiting and authentication are included in this category. • items that are owned virtually which have urls or ip addresses that are open and unrestricted include government publications and selected home pages as determined by the local institution. bibliographic records that are contributed should have as full cataloging as possible for identification and retrieval. materials that are on reserve and other locally defined special materials (e.g., materials that have use restrictions placed upon them) may be excluded from prospector. the prospector union catalog will also include bibliographic and circulation information from libraries that do not use innovative interfaces as their local system vendor. i the integration of non-innovative libraries into inn-reach one of the major efforts in the prospector project was to be able to incorporate bibliographic, item, summary serial holdings, and acquisitions records from other vendors with the inn-reach union catalog software. in 1997, when the grant was written, it was envisioned that the system would incorporate libraries using two ils vendors-innovative interfaces, inc. and carl corporation-two of the major vendors in colorado at the time. twelve libraries used innovative interfaces and four used the carl system (denver public library, regis university, colorado school of mines, and the university of wyoming). however, in late 1999, the colorado school of mines and the university of wyoming decided to migrate to the voyager system by endeavor information systems (this is occurring in 2000). both of these institutions have still expressed an interest in being part of prospector, so they will need to be integrated in 2001 after they are stable on their new system. the remaining carl sites will be fully integrated in 2000. the integration of records that allows document requests from different vendors is being accomplished as follows: • innovative interfaces, inc. has published a set of specifications for how bibliographic, item, summary serial holdings, and acquisitions order records should be formatted to be loaded into the union catalog. • published specifications were also created for patron verification and for how document requests are to be transferred. • the alliance office is developing the software to package usmarc bibliographic records, item records, summary serial holding records, and order records to transfer to prospector. work is also being done so that document requests may be relayed between the different systems using an intermediate unix server running an sql database with a web interface for circulation to ill staff. because the carl and endeavor systems are built differently, the record updating may be done on a "batch" basis several times a day. patron verification, to determine if a carl or endeavor patron is in good standing before allowing a document request, will be done in realtime. i administrative and committee structures under provisions of the grant, the dean of libraries at the university of northern colorado provides administrative management for the project while the colorado alliance of research libraries houses the server, maintains the union catalog software, provides network connectivity, prospector i bush, garrison, machovec, and reed 77 reproduced with permission of the copyright owner. further reproduction prohibited without permission. develops the software to integrate the non-innovative sites into the union catalog, and provides ongoing system administration support for the project. a prospector steering committee comprised of deans and directors of three participating libraries provided general overview for the project during the initial stages. to carry out the initial work of the project, two task forces were appointed with responsibility for detailed design and implementation of the system: the catalog/reference task force and the circulation/document delivery task force. the catalog/reference task force was charged with making all bibliographic and display decisions relating to the catalog. this included establishing the criteria for determining which institution's bibliographic record displays in the catalog, developing display and overlay hierarchies for bibliographic records coming into the system, and identifying marc fields that would be indexed and displayed in the catalog. membership on this task force included both public services and technical services personnel, but did not include representation from every participating library.9 the circulation/document delivery task force was charged with developing common circulation policies to be applied in the union catalog including loan periods, fines, renewals, holds, recalls, checkout limits, and patron blocks. the task force was also responsible for developing the precedence table for routing patron requests. the members of this task force represented each participating library, and several libraries had representation from both their circulation and interlibrary loan department.lo these two task forces conducted meetings from july 1997 through december of 1999. the stage was set for the task forces' work at a training session held by innovative interfaces, inc. on system operation and functionality. each group received direction on what policy issues needed to be determined to lay the groundwork for establishing the codes that drive system functionality. after the initial training, each task force met several times a month, often consulting with innovative interfaces, inc. and/ or their local libraries as their planning and deliberations continued. communication was an important component during the development of the system. soon after the grant was awarded, staff from the alliance office visited each participating library and met with library personnel to explain the overall goals of the project and how work would be conducted. as detailed development progressed, open forums were held in central locations to keep representatives of all libraries apprised of progress and to get feedback regarding specific policy issues. completed work from the task forces was mounted on the prospector web site. in addition, regular articles appeared in data/ink, the alliance monthly newsletter. specific training sessions were conducted both by the task forces and by innovative interfaces. 78 information technology and libraries i june 2000 as the actual database loading process began, the catalog/reference task force conducted sessions at each prospector library. these sessions were twofold in purpose: to provide an opportunity for a general overview of how the database structure and indexing worked for all library personnel, and to train technical services personnel in how local coding of records impacted the display of their local records in the global catalog. in preparation for going live with patron requesting, innovative interfaces, inc. conducted pac searching and circulation training sessions at several central locations for frontline staff from all institutions. in addition, the circulation/ document delivery task force held a central session for representatives from all libraries to discuss issues relating to the flow of materials among libraries. during system implementation, it became apparent that some ongoing structure would be required for ongoing maintenance and development of the global catalog. in completion of their charges, each task force prepared a final report, which was submitted to the steering committee and to the prospector directors group. each task force recommended its own termination but outlined a structure to address ongoing issues. as approved by the prospector directors group, the ongoing governance structure is multilayered with frontline operations groups, broader planning and policy-setting committees, an advisory committee, a directors group, and electronic discussion lists for communication. monitoring of the day-to-day work of the cataloging and circulation/ document delivery operations is handled by frontline staff via e-mail, electronic discussion lists, and/ or telephone. broader planning and policy issues are addressed through smaller, representative standing committees. the advisory committee and directors group operate at a policy level. the new structure includes: • a catalog site liaison group comprised of one representative from each participating library and charged with serving as the point of contact for inquiries regarding catalog maintenance, access and record merging; • a catalog/reference committee comprised of members selected from the participating libraries and charged with responsibility for all bibliographic and display issues relating to prospector. this includes monitoring details of the current implementation as well as addressing ongoing policy issues, recommending system enhancements, testing new system functionality, and training staff at new sites coming into the system; • a document delivery site liaison group comprised of one or more representatives from each participating institution with responsibility to reproduced with permission of the copyright owner. further reproduction prohibited without permission. serve as a point of contact for other prospector libraries that have inquiries concerning issues, lost books, courier delivery, or related topics; • a circulation/document delivery committee comprised of representatives selected from the participating libraries and responsible for issues relating to the courier delivery service, circulation load-balancing, monitoring member compliance with circulation policies, recommending system enhancements, testing new system functionality , and the year-end reconciliation of lost book charges; and • a prospector advisory committee comprised of tewnty-four deans and directors from participating libraries to address issues requiring quick response relating to project specifications and operating rules. the prospector directors group is comprised of the deans/ directors of all participating libraries and is charged with making recommendations on high-level policy and admission of new participants . since prospector is a project of the nonprofit colorado alliance of research libraries consortium, all final high-level decisions and financial commitments are subject to the approval of the board of directors of the consortium . at the present, five of the sixteen prospector libraries are not part of the formal consortium but participate in this one project. the newly formed committees will continue to address broad policy and operational issues such as the load-balancing tables for routing patron requests to owning libraries, will document best practices for local libraries to follow in implementing certain functionality within their local system to achieve maximal results in the central catalog, will identify enhancements to the system , and will test new release functionality. i borrowing and lending policies and specifications as a prelude to its work, the circulation / document delivery task force examined borrowing and lending practice s from other innovative interfaces . inn-reach sites and reviewed the borrowing policies for consortia! borrowers that were developed and agreed to by a subset of alliance libraries (university of northern colorado, auraria library, and denver public library) several years ago. the first major duty of the task force was to establish circulation and document delivery policies that would govern those functions within the prospector system. these common circulation and document delivery policies were based on a series of assumptions: • the task force policies apply to the unified catalog only; local sites define local policies; • local workflow remains local purview; • policies should be kept simple; • circulating materials are commonly circulated materials, primarily books, at each site; • the task force will work within the confines of the inn-reach system; • if a patron is blocked locally, he or she will be blocked at the global level; • for routing purposes, each institution (rather than branch) is the routing site; and • local sites will determine when their items are declared lost. the task force established a series of recommendations for policies that applied to the prospector system . the proposed policies were discussed within the local institutions as well as with various administrative groups. the final policies for prospector lending as adopted and implemented in the system are: • loan period : twenty -one days • renewals: one • number of holds allowed : forty • checkout limit: forty items • recalls: none, except for academic library reserve collections • lost book charge: $100, which is comprised of a $75 refundable lost book charge and a $25 nonrefund able processing fee • libraries establish their own local rules for overdue fines on prospector materials . key features of the inn-reach software that were emphasized with each local library during training sessions are: • libraries have local control over what is loaned through the global catalog. • libraries have local control over which of their patrons can borrow materials through the global catalog. • if the local copy is checked out or missing, a copy may be requested through prospector. • the system is sensitive to multivolume works and allows particular volumes to be selected. the ongoing document delivery committee has developed a series of "best practices" that establish benchmark policies that each library is urged to adopt in the spirit of uniform cooperation among participating libraries. individual libraries, however, may choose not to adopt these practices. prospector i bush, garrison, machovec, and reed 79 reproduced with permission of the copyright owner. further reproduction prohibited without permission. system functionality the actual steps for a patron to request an item within the prospector system are simple and self-explanatory. once a patron has identified an item they wish to order, the following steps take place: • the user is prompted for institutional affiliation, name, and library card number. • the system checks local system to ensure that the patron is in good standing. • the user selects a pick-up location from those offered by their home institution. • the system forwards the patron request to an owning library with an available circulation status doing load balancing among the libraries with available copies. once the patron request is forwarded to a lending library, the request goes into the queue of requested items from that library. each library has established its own workflow for handling requests; however, that workflow must include interaction with the system to record the status of the request. once the item is located by the lending library, it is checked out to the requesting patron's "home" library and is sent, via courier, to that library. the "home" library then receives the item in the system and holds it pending pick-up by the patron. when the patron arrives to borrow that item, it is checked out to that patron's record according to the prospector loan rules. having a common set of loan rules for all prospector loans provides consistency for the patron. the patron may still have multiple due dates on items checked out at the same time depending on the loan rules for local checkouts. the system maintains statistics on several elements of the borrowing and lending processes. it tracks the total number of items borrowed and loaned and calculates the ratio of borrowing to lending per institution. in addition, it tracks the number of items cancelled and the reason why, the number of holds filled and cancelled, and several other groupings. i challenges and issues with the building of prospector still underway and public access available only since late july 1999, prospector is doing a respectable volume of loans in its infancy. over ten thousand items were delivered during the first six months of operation. this number is expected to dramatically rise as the system grows and as local libraries promote the service. this auspicious start provides a sense of 80 information technology and libraries i june 2000 accomplishment tempered by recognition that there is more to do. some of the major challenges facing the project include: • • • • • • • • development is underway to integrate records for the carl system libraries into the central catalog and provide borrowing capabilities for their patrons. as member libraries choose other online system providers, ideally, these systems likewise need to be interfaced with the prospector system. coming to agreements with all vendors involved will require careful negotiation and wording of contracts. discussions are underway with innovative interfaces and endeavor information systems for merging endeavor libraries into inn-reach. monitoring how the fiscal accounting for first endof-year reconciliation will work for lost books is planned. developing best practices and evaluating software enhancements for inn-reach are necessary. we need to determine how to handle electronic resources and multiple formats, and load records from commercial electronic resources, for example, net library. we must improve matching within the system and additional enhancements to the prospector web site. with growth of the system, full-time operations and management staff may be required. securing funding for the new ventures and new staffing will require development efforts or a sharing of costs by members. there is no state-based funding for ongoing maintenance and new product acquisition. with the increasing flow of materials between libraries, the courier delivery service must be monitored on an ongoing basis. the statewide courier service has been recently restructured and was contracted based on pre-prospector activity levels for interlibrary loan materials. with the ever-growing popularity of prospector, there will be a corresponding increase in volume for the courier. service levels need to be monitored closely to ensure that the speed of delivery is maintained and that the loss and incorrect routing rate is within acceptable limits. the balance of borrowing and lending will have financial impacts on some of the participating libraries. through a legislative allocation, the state library of colorado provides funding on a per transaction basis to libraries that are net lenders, or that loan more materials than they borrow. most libraries are considering the prospector transactions as equivalent to interlibrary loan transactions and counting them toward the payment for lending program. it is anticipated that the inclusion of prospector activity in the interlibrary loan borrowing and reproduced with permission of the copyright owner. further reproduction prohibited without permission. lending statistics will significantly alter the balance of payment for lending among the prospector libraries. already prospector has shown that it is changing behaviors. the cooperation between libraries has been impressive. in member libraries, staff are factoring prospector into their plans and realizing that keeping prospector operations staff informed of problems is a good habit. user searching and document delivery patterns are changing. margaret landrum, director at the fort lewis college library, predicts that prospector will have a dramatic effect on researchers in the geographic area. its start has given all members a share in that expectation. i the future and interesting spin-offs union catalog projects often take on a "life of their own" far beyond what was originally envisioned. some of the future spin-offs may include: • the addition of other research libraries in nearby states. • collection overlap studies and improved coordination on acquisition and weeding projects between libraries. • with the full implementation of the union catalog, there are opportunities for resource sharing at a broader level. the central catalog has the functionality to support bibliographic records for and access to "consortia!" resources, thus enabling libraries to jointly purchase resources and provide centralized access to them. • as database and online information providers develop new methodologies for access to their resources, there will be opportunities to easily link from either the local or central catalog to these online resources, a process which is cumbersome and/or impossible in the nonglobal environment. for instance, where databases are centrally mounted at the alliance office with shared ownership, the link to serial holdings feature is pointed to prospector, thus providing patron access to consortiawide holdings. • use of the system as a central repository for cataloged metadata for electronic resources on the web. • encouraging innovative interfaces, inc. to allow document requests that "fail" in the system to be forwarded to national ill subsystems or commercial document suppliers using national standards. i conclusion prospector dramatically alters the bibliographic landscape in colorado, offering patrons easy access to the bibliographic wealth of the state. patrons will be easily able to move from a local catalog to this regional system and request materials. librarians will find the system useful for collection overlap studies, improved coordination on acquisitions and weeding projects, z39.50 links with other indexing/ abstracting services for serials holdings information (e.g., ovid or silverplatter), and expedited book delivery. the high level of cooperation among the diverse nature of the participating libraries is exemplary. the incorporation of public and private universities, public libraries, and special libraries offers a model for cooperation. references 1. anthony j. dedrick, "the colorado union catalog project," college and research libraries news 59, no. 10 (1998): 754-55; george machovec, "prospector: a regional union catalog," colorado libraries 25, no. 2 (1999): 43-45. 2. clifford a. lynch, "the next generation of public access information retrieval systems for research libraries: lessons from ten years of the melvyl system," l!'.formation technology and libraries 11, no. 4 (1992): 405-15; bernie sloan, "testing common assumptions about resource sharing," information technology and libraries 17, no. 1 (1998): 18-29. 3. thomas dowling, "ohiolink-the ohio library and information network," library hi tech 15, no. 3 / 4 (1997): 136-39; lindy naj, "the carl system at the university of hawaii uhm library," library software review 12, no. 1 (1993): 5-11. 4. gary pitkin and george machovec, colorado union catalog. senate bill 96-197. technology grant and revolving loan program. excellence in learning through technology. december 1996. grant proposal by the university of northern colorado and the colorado alliance of research libraries. 5. gary pitkin, colorado union catalog-prospector. final report. july 27, 1999. 6. machovec, "prospector: a regional union catalog." 7. ibid. 8. ibid. 9. prospector staff web site, www.coalliance.org/prospector. 10. ibid. prospector i bush, garrison, machovec, and reed 81 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a general statistics about prospector: • sixteen libraries (see below) • twelve innovative interfaces sites (went live in fall 1999) • two carl sites (to go live in 2000) • two voyager endeavor sites (to be incorporated in 2001 pending final negotiations with both vendors) • 3.6 million unique marc records as of january 2000, which are expected to grow to more than 5 million after the incorporation of the carl and endeavor sites. • 9 million item records, which are expected to grow to more than 12 million after the incorporation of the carl and endeavor sites. • currently 61 percent of the records in the system are held by only one library. • greater than 1 million registered patrons are possible users . denver public library has over 500,000 patrons and jefferson county public library has over 300,000 patrons . • prospector url for public use : http:/ /prospector.coalliance.org • prospector staff url, which includes policies, committee minutes, and profiling tables: www.coalliance.org/ prospector prospector libraries auraria library colorado college colorado school of mines colorado state university denver public library fort lewis college jefferson county public library regis university university of colorado at boulder university of colorado/colorado springs university of colorado/health sciences university of colorado/law library university of denver university of denver/law library university of northern colorado university of wyoming web site http://carbon.cudenver.edu/public/library http://www.coloradocollege.edu/library http://www.mines.edu/academic/library http://manta.library.colostate.edu http://www.denver.lib.co.us http:/ !library. fortlewis.edu http://www.jefferson.lib.co .us http://www.regis.edu/1 ib/wlibhome.htm http://www.libraries.colorado.edu http://web.uccs.edu/library http://www.uchsc.edu/library/index.html http://www.colorado.edu/law/lawlib http://www.penlib.du.edu http://www.law.du.edu/library http://www.unco.edu/library http://www-lib.uwyo.edu 82 information technology and libraries i june 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix b early borrowing/lending data the borrowing and lending patterns in prospector will be of interest to monitor because of the wide variety of participating libraries in the system. the incorporation of both academic and public libraries has the potential for different use patterns as seen in more homogeneous academic union catalogs. the following data represents some of the very early borrowing and lending patterns in prospector . all of the libraries in the table went "live" in terms of borrowing and lending in late july or august 1999, with the exception of jefferson county public library, which went live in november 1999. history with other similar projects has shown that use will dramatically grow as libraries and users gain familiarity with the service. the incorporation of denver public library in 2000 should provide significant impact on the service. at the present (and in the accompanying table), prospector has been configured to do random load balancing without the use of any precedence tables to force document requests to one site or another. borrowing site aur ccc su cul cub du dul ftl jcpl uccs uchsc unc lending (owning) site ratio ub totals 1879 930 2301 225 1520 1132 129 946 1775 882 364 2063 aur 0.89 1667 108 282 33 232 187 17 113 234 128 70 263 ccc 0.72 673 114 109 11 96 57 66 89 53 10 68 csu 0.86 1985 267 156 29 272 221 18 130 288 134 55 415 cul 0.55 123 24 9 20 5 11 12 3 10 7 3 19 cub 2.05 3120 396 231 590 26 260 21 246 420 233 56 641 du 2.07 2341 361 153 464 42 315 20 163 279 131 69 344 dul 1.12 145 27 7 14 27 15 25 3 11 6 4 6 ftl 0.54 511 66 36 130 3 66 36 7 72 31 11 53 jcpl 0.54 962 187 81 201 11 154 65 11 64 33 38 117 uccs 1.02 900 170 65 148 12 130 65 5 3 137 15 90 uchsc 0.83 301 63 5 49 5 26 31 3 5 32 36 46 unc 0.69 1422 219 81 291 27 207 153 13 89 222 90 30 prospector fulfillments report, august 1999 through february 14, 2000 prospector i bush, garrison, machovec, ano reed 83 we can do it for free! using freeware for online patron engagement public libraries leading the way we can do it for free! using freeware for online patron engagement karin suni and christopher a. brown information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13257 karin suni (sunik@freelibrary.org) is curator, theatre collection, the free library of philadelphia. christopher a. brown (brownc@freelibrary.org) is curator, children’s literature research collection, the free library of philadelphia. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. in the early weeks of the pandemic, the special collections division of the free library of philadelphia (https://freelibrary.org/) responded to the library’s call for fun and interactive online engagement. initially staff members released games and buzzfeed-inspired lists via various social media accounts to amuse patrons, distract from the lockdown, and provide educational programming. as the list of activities grew, we realized this content needed a more substantial home; the return on investment of time for the development and production of an online game to be released once on social media was not sufficient. activities and passive programming that took hours to create could easily fall victim to social media’s algorithms and be quickly buried in a patron’s feed. the free library’s official blog was an insufficient option because it promoted all library programming, and our goal was to highlight the value of our division and the materials housed within it. we resolved these issues by creating an online repository solely with freeware systems (https://bit.ly/funwithflpspeccoll). the repository provides a stable landing page wherein the special collections division content builds meaningful connections with patrons of all ages. this model can be readily adapted and is a valuable tool for library workers promoting their own online engagement. repository framework it was clear that our division could not add to the burden of an overworked it staff by requesting support for digital engagement. we needed to seek external alternatives that would interest patrons and could be managed with limited training. before we began our search, we brainstormed a list of requirements: • an inexpensive and user-friendly hosting platform • a pleasing look and easy navigation • the ability to be updated frequently and easily • the flexibility to adapt and expand as our requirements change our search led us to the google suite of products, specifically google sites and google drawings. google sites and google drawings integrated perfectly with each other, and we appreciated their usability and relative simplicity. once we selected the software, we knew we needed a list of best practices to guide the repository’s creation: ● to establish a visual connection with our official website, the repository would primarily use the free library’s branded color scheme. ● all thumbnails created would be square, allowing us to reuse the image as promotional material on different social media accounts. mailto:sunik@freelibrary.org mailto:brownc@freelibrary.org https://freelibrary.org/ https://bit.ly/funwithflpspeccoll information technology and libraries march 2021 we can do it for free! | suni and brown 2 ● all members of the division can create content, but the ability to update and edit the repository would remain limited to ensure consistency. these guidelines have proven effective. the color scheme and thumbnail rules formed a framework wherein we could work productively without “reinventing the wheel.” limiting administrative abilities has allowed us to maintain a controlled vocabulary within the repository, better unifying the content. repository software the google suite, specifically google sites, is advantageous for library workers looking to create professional-looking content quickly. it is free with a google account and built-in templates allow users to build a fully functional website within a few hours with little-to-no design experience. as with all freeware, google sites has quirks. the foremost is that while there are options for customization, these options are finite. there are a limited number of layout, header, and font designs meaning that anyone using the software must temper their vision to fit within the confines of the program. google drawings is far more flexible, in part because it is a much simpler program. users familiar with software like powerpoint or ms paint have the ability to design images for headers, thumbnails, etc. two drawbacks we encountered with this freeware are the restrictions on image upload size (a consideration for our division given the archival files used in our digital collections) and the limited ability to create word art. for our division, the advantages of these software products outweigh their limitations. content framework the repository houses programming devised primarily with freeware. an early discovery was a suite of activities from flippity (https://www.flippity.net). designed for educational use, flippity provides templates for a variety of online activities including memory games, matching games, and board games. our primary focus has been on the first two, although we continue to explore new aspects of this suite as templates are added. flippity works with google sheets and can integrate images from google drawings. jigsaw planet (https://jigsawplanet.net/) has been used extensively by libraries and museums during the pandemic. it allows creators to easily turn images into puzzles that are played online, either on the site itself or through embedding the puzzle. the site allows registered users to access leaderboards, and it allows creators to track how many times puzzles have been played. in addition to the ease of use, the major benefit of jigsaw planet is that the patron can customize their experience by changing the number of pieces to fit their preferred level of difficulty. the desire for audio and video content has surged over the last several months, and we have sought to meet that need through the use of a variety of software. in regard to video, youtube is not a new tool, but the majority of our pre-pandemic programs were not filmed. with the shift to crowdcast and zoom, we now have a library of online lectures and other events that have been uploaded to youtube and can be viewed repeatedly and at any time. with a dedicated home for this content, we have been inspired to seek out older videos of special collections programming across multiple channels and link them to the repository. https://www.flippity.net/ https://jigsawplanet.net/ information technology and libraries march 2021 we can do it for free! | suni and brown 3 one of the newest additions to our offerings has been the podcast story search from special collections (http://bit.ly/flpstorysearch), which explores stories based on, inspired by, or connected to material artifacts. the podcast is recorded and edited using zencastr and audacity and is posted on anchor, which also distributes it to major listening apps. in recent weeks, our division has added images, blog posts, and additional con tent for current and past exhibitions. this is the first formal exhibition compilation since the special collections division began in 2015, and we are delighted that it is available for the public to explore. the material is arranged using templates and tools available in google sites, allowing patrons to view image carousels, exhibition tags, and past programs. the inclusion of this material marks a shift away from the repository functioning as a response to the need for pandemic-related content to a living history of our division and our work promoting the special collections of the free library. accessibility accessibility and equity of access lie at the core of library service. sadly, we were not initially focused on this point, and our content was not fully accessible, e.g., text was presented in thumbnails only which limited the use of screen readers to relay information. as the content expanded, we sought to make the space as inclusive as the freeware limits allowed. alternative text was added to images and information was not limited within thumbnails. this is an ongoing process, but one that is necessary to reach as many patrons as possible. analytics site visits and other statistics for a library’s online presence are always important, but especially so during the pandemic when restricted physical access has driven more patrons to online resources. our plan for capturing this information was two-pronged. first, we used bit.ly to create customized, trackable links for our content. these are used within the repository and on social media and in other online promotions. this has proven to increase repository traffic while providing information on how patrons discover our content. the statistics generated from bit.ly are only available for 30 days for free accounts, albeit in a rolling 30-day window. knowing this, we transcribe the statistics monthly into a spreadsheet to maintain a consistent account of patron access. our second prong is google analytics, a freeware option that only tracks data within the repository. google analytics connects a single google account to google sites, but the integration is seamless and the data remains available indefinitely. this provides a visual breakdown of statistics, including maps and graphs that are easily shared with other stakeholders. by using both tools we are able to surmise who is visiting the repository, where they are finding the links, and which sections are popular with our patrons. conclusion the special collections repository was created in response to a growing need for online patron engagement during the early weeks of the pandemic. our division strove to engage the public with fun, educational programming and activities primarily using freeware. this has proven to be successful with the general public and members of our division. the statistics from the site have both informed content creation and engendered a better appreciation for the repository from our administration. as we move forward, the repository is evolving into a comprehensive collection of what the special collection division does and how we meet the need for patron engagement http://bit.ly/flpstorysearch information technology and libraries march 2021 we can do it for free! | suni and brown 4 online and in person. it is a framework that can be used by library workers across a multitude of areas and specialties, housing activities from story times and passive programming to book clubs and lectures. repository framework repository software content framework accessibility analytics conclusion 86 information technology and libraries | september 2011 on technology and other decisions in my career. i know that i can post a question to lita-l or ala connect and get a quick, diverse response to an inquiry. i know that i can call on my lita colleagues to serve as references and reviewers as i move through my career. i also know that i can depend upon lita to help keep me current and well informed about technology and how it is integrated into our libraries and lives. this also gives me an edge in my career. so much of the lita experience is currently gained from attending meetings in person and making connections—those of you who have attended the lita happy hour can probably attest to this. for several years lita has not had a requirement to attend meetings in person and allows for virtual participation in committees and interest groups. several ad hoc methods have developed to allow members to attend meetings virtually. to better institutionalize the process two new taskforces have been formed to look at virtual participation in formal and informal lita meetings. a broadcasting taskforce is charged with making a recommendation on the best ways to deliver business meetings and another taskforce is charged with investigating methods to deliver education and programming to members virtually. both taskforces will pay careful attention to interaction and other attributes of in person gatherings that can be applied to virtual meetings so that we retain the connection-making experience. it is hard to assign monetary value to membership in an association, but we do so every time we make a decision to join or renew membership. when i renew and pay annual dues to lita i affirm that i am receiving value, and i do so without thinking. it is a given that i will renew. in addition to my library memberships i am a member of the wildlife conservation society (the group behind the bronx zoo and several other zoos in nyc). each year as i renew my membership i do a quick cost analysis calculating how many times i visited the zoos and what it would have cost my family if we were not members. but before i can finish that exercise my mind begins to wander and i start to think about the excursions to the zoos-camel rides, newborn animals—and those experiences and the memories created derail any cost recovery exercise. it is impossible to put monetary value on the wonderful experiences my family share during our visits to the zoo (incidentally it is more economical as well). i also feel some pride in contributing to an organization that does such wonderful programming and makes a real difference for animals and our planet. i understand that my membership helps them do what they do best. i don’t do this cost analysis with lita, but perhaps i should. the current price of lita membership is sixty dollars per year, which is about sixteen cents per day. as members we need to ask ourselves if we are receiving in return what a s i write my first president’s message for ital, i am wrapping up my year as vice president and the ala annual conference is fast approaching. the past year has been a busy one—handling necessary division business, including meeting with my fellow ala vice presidents, making committee appointments, planning an orientation for new board members, strategic planning, and attending various conferences and meetings to prepare me for my role as lita president. i am lucky to follow such wonderful leaders as karen starr and michelle frisque, who have both helped ready me for the year ahead. my life outside of lita has been equally busy. i started a new position as the director of weill cornell medical college library earlier this year and have a busy home life with two small children. as usual, i have been juggling quite a bit and often dropping a few balls. my mantra is that it is impossible to keep all the balls in the air all of the time, but when they do drop be careful not to let them roll so far away from you so that you lose sight of them. eventually i pick them up and start juggling again. i know that i am not alone in this juggling exercise. lita members have real jobs and friends, family, and other social responsibilities that keep us busy. so why do we give so much to our profession, including lita? if you are like me, it is because we get so much in return. the importance of activity and leadership in national, professional associations cannot be overrated. my experience in lita and other professional library associations has given me an opportunity to hone leadership skills working with various committees and boards over the years. the achievements that i have made in my career have a direct correlation to my work with lita. as libraries flatten organizational structures, lita is one place where anyone can take on leadership roles, gaining valuable experience. many members have agreed to take on leadership roles in the coming year by volunteering for committees and taskforces and accepting various appointments and i want to thank everyone who came forward. in the coming year i will be working with several officers and committees to develop orientations, mentoring initiatives, and leadership training for our members. i do appreciate that not everyone wants to take on a leadership role in lita. the networking opportunities, both formal and informal, also have been extremely valuable in my career. the people i have met in lita have become colleagues i am comfortable turning to for advice colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york colleen cuddy president’s message: reflections on membership continued on page 89 editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 a collaborative approach to newspaper preservation public libraries leading the way a collaborative approach to newspaper preservation ana krahmer and laura douglas information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12596 ana krahmer (ana.krahmer@unt.edu) oversees the digital newspaper unit at unt. through this work, she manages the texas digital newspaper program collection on the portal to texas history, which is a gateway to historic research materials freely available worldwide. laura douglas (laura.douglas@cityofdenton.com) is the librarian in charge of the special collections with the denton public library which houses the genealogy, texana, and local denton history collections as well as the denton municipal archives. in her work, she regularly assists patrons with newspaper research questions specifically related to denton newspapers. © 2020. introduction when we first proposed this column in january 2020, we had no idea how much the world would change between then and the july deadline. while we have collaborated for many years on a variety of projects, the value of our collaboration has never proven itself more than in this covid 19 reality: collaboration leverages the strengths and resources of partners to form something stronger than each. in this world of covid-19, the collaboration between the denton public library (dpl) and the university of north texas libraries (unt) has allowed us to build open, online access to the first 16 years of the denton record-chronicle (drc). this newspaper is the city’s daily newspaper of record, and the collaboration between dpl and unt resulted in free, worldwide research access, via the portal to texas history. the project was funded by a $24,820.00 grant through the imls library services and technology act (lsta), awarded from september 2019 to august 2020 by the texas state libraries and archives commission (tslac) as part of their textreasures program, to digitize 24,000 newspaper pages. this project has also resulted in a follow-up collaboration to build open access to further years of this daily newspaper title, through a 2021 textreasures award to digitize an additional 24,000 newspaper pages. the real question, though, is what recipe made this a successful collaboration. background the drc has been the community newspaper in denton for over 100 years. due to the sheer amount of material, digitizing a daily newspaper with such an extensive publication run is a long term project that requires a lot of planning, time, and funding. since the dpl’s inception in 1937, the library has endeavored to collect items related to denton and texas history. with community support, the library has developed a well-rounded collection of local history, texana, and genealogical materials, all of which are housed in the special collections research area at the emily fowler central library. these materials support research, projects, and exhibits. one major research resource is the archival collection of local newspapers, mainly the drc, maintained on 752 rolls of microfilm containing issues from 1908 to 2018. before this project, access to these newspapers was only available in the special collections research area, through microfilm readers or paid subscription services. in addition, although steps had been taken to preserve the film, many of the rolls show wear from years of use, while others have developed vinegar syndrome and soon will no longer be a usable resource. in 2018, unt obtained publisher permission to make the drc run freely accessible on the portal to texas history. mailto:ana.krahmer@unt.edu mailto:laura.douglas@cityofdenton.com information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 2 laura had been exploring different avenues to digitize this microfilm and make them freely available to the public when ana contacted her with information about the texas state library and archives commission (tslac), which awards annual grants supported by library services & technology act funds, through the institute of museum & library services. lsta funding is annually provided to all fifty states through the institute of museum and library services, and the state library determines how this funding is expended. in texas, lsta funding is provided through a number of grant programs including textreasures, a competitive grant program for any texas library. as described by tslac, the “textreasures grant is designed to help libraries make their special collections more accessible for the people of texas and beyond. activities considered for possible funding include digitization, microfilming, and cataloging.” libraries can apply to fund the same type of project up to three years in a row, and the drc project applied for $24,820.00 in 2019 to digitize 24,000 newspaper pages, representing the earliest years of microfilm available at the denton public library. to create a viable grant application dpl partnered with the texas digital newspaper program (tdnp), available through unt’s portal to texas history, and decided to start first by digitizing as many early years of microfilm as grant funding could cover. tdnp is the largest single-state, open access, digital newspaper preservation repository in the u.s., hosting just under 8 million newspaper pages at the time of this writing. in late 2018, unt received permission from the owner of the drc to include the newspaper run in the tdnp collection, which represented a very exciting opportunity for city and county researchers, as well as for the dpl. as thanks to the publisher for granting permission, unt built access to the 2014 to 2018 pdf eprint editions, which the tdnp preserves as a service to texas press associationmember publishers. after this, unt contacted the dpl to discuss applying for grant funding. once laura learned that the dpl had received the 2019 award, she prepared the local planning steps necessary to collaborate with the university. the project becomes real the denton record-chronicle digitization project grant contract and resolution for adoption went before the denton city council on october 8, 2019. the city of denton issued a press release that day, and the drc also published an article announcing the project. over the next few days the drc article appeared across social media, including the city of denton’s social media accounts, as well as through library-associated email newsletters. after the first newspapers became available on the portal, both dpl and unt prepared blog posts about the project, which have also appeared on social media. these blog posts fulfilled publicity requirements specified by the grant, even while offering training to researchers in how to work with the online newspaper collection. one major convenience to this collaboration is that both organizations are in the same city. transfer of materials was arranged by email and accomplished by a trip across town. we completed the digitization process in batches, with the first 10 microfilm rolls going to unt on october 10, 2019, and unt uploading the first 854 issues in december 2019. the newspapers from the first microfilm set represented 1908-1916. dpl transferred the last set of microfilm in april 2020, with dates ranging from 1917 through september 1924, shortly after which unt completed and uploaded the grant-funded count of 24,000 newspaper pages. the estimated year given in the grant proposal that the scans would have gone through was 1938, but the page count on this newspaper proved to be much, much higher than originally estimated, and as a result, the funding only covered up to september 1924. dpl and unt will continue their partnership by information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 3 digitizing further years of the drc, through a variety of methods. as we were in the midst of preparing this column the tslac contacted laura to inform her that dpl had received a second grant award, in the amount of $24,820.00 to digitize 24,000 additional newspaper pages, which will move the newspapers through 1954. as of july 23, 2020, the denton record-chronicle collection on the portal to texas history hosts 6,168 items and has been used 16,397 times. this includes 1,743 items that are pdf eprint editions of the paper from 2014 to 2018, which unt uploaded for long-term preservation and access. unt uploads eprint editions without a charge, and digitally preserves these through an agreement with the texas press association; these pdfs were not a part of the funded grant, but they do enhance access to the collection and helped to build community interest in seeing earlier years available on the portal. the usage of the collection skyrocketed after the early editions became available. january 2020 saw the highest number with the collection uses at 3105. once this project is complete, it will include over 200,000 newspaper pages. neither dpl nor unt has the ability to tackle this project on their own, but through collaboration, it is possible. recipe for your own collaboration success these are planning recommendations as you prepare for your own collaboration, drawn from what we’ve learned as we worked on this project together. 1. communicate early and often: communicating needs enables partners to identify each other’s strengths. each partner will bring their strengths to the project, which in this case included actual archival materials from dpl and technological expertise on the unt side. in addition, be prepared to communicate with local groups who need to endorse or sign off on the project, including possibly the city council, the historical commission, or the city manager. 2. partner to write the grant: partnering in preparing the grant achieves two goals: first, it enables partners to develop a communication flow that will move forward throughout the collaboration; second, it ensures that partners know what each can realistically accomplish within the grant timeline. in this case, laura wrote most of the grant application herself, but she had very specific questions that ana had to answer, and she needed key elements from unt, including project budget, technological infrastructure, and a commitment letter. communicating early and partner on the grant application process ensured that there were no unexpected surprises that were within the control of either partner. 3. work together to explain your partnership: with a grant of this size, we always spoke in advance to ensure we weren’t over-promising when newspapers would appear online. this also gave both laura and ana lead-time for promoting the project: laura would share the years of the physical microfilm before sending them over, and ana would walk laura through the years that would get uploaded in a given month. this allowed them to plan publicity, training, and outreach efforts based on the dates of newspapers going online. in addition, laura regularly communicated with ana prior to submitting grant reports, and this was critical in preventing miscommunication going to the funding agency. 4. pad enough time for the unexpected: of course, we had no way of knowing a pandemic would occur when we began this project, and what saved us was that we’d started planning as soon as we learned about receiving the grant, rather than as soon as the grant started, which was in september 2019. planning two months in advance put us two months ahead of schedule, and we were able to start exchanging materials as soon as the grant period information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 4 started. this gave us a few weeks of lead time so we successfully completed the project by the end of april 2020, at which point the microfilm page count had been scanned and unt staff could remote in to complete the digitization processes. extra time is only a benefit. if the covid-19 pandemic had not occurred, we still might have had to address technological or film deterioration problems, and we could resolve these earlier rather than later because we had given ourselves a few extra weeks of lead time. 5. don’t be afraid to explain changes to your granting agency: if your project changes due to unforeseen circumstances, for example in our project the uploaded total of pages reached 24,000 before we digitized the entire planned date range. unt charges a per-page digitization fee, and these newspaper issues proved to contain more pages than expected . laura contacted the representative at tslac to explain the situation and offer an alternative approach to cover the digitization of the remaining years. the important thing is to keep the granting agency informed of any changes, delays, or hiccups in the project. we are both proud of having completed this project three months before the end of the grant period, but we know that without solid communication, planning, or flexibility, the covid-19 pandemic would have made the situation extremely difficult if not impossible. leveraging the portal’s technical infrastructure and tdnp’s newspaper expertise with the volume of material and collection expertise provided by the dpl has given us a model for success we plan to capitalize on in future projects. best of all, in the world of covid-19, our patrons can access these newspapers from the comfort of their own couches, without even taking off their pajamas! introduction background the project becomes real recipe for your own collaboration success microsoft word 13389 20211217 galley.docx article hackathons and libraries the evolving landscape 2014–2020 meris mandernach longmeier information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13389 meris mandernach longmeier (longmeier.10@osu.edu) is head of research services, the ohio state university libraries. © 2021. abstract libraries foster a thriving campus culture and function as “third space,” not directly tied to a discipline.1 libraries support both formal and informal learning, have multipurpose spaces, and serve as a connection point for their communities. for these reasons, they are an ideal location for events, such as hackathons, that align with library priorities of outreach, data and information literacy, and engagement focused on social good. hackathon planners could find likely partners in either academic or public libraries as their physical spaces accommodate public outreach events and many are already providing similar services, such as makerspaces. libraries can act solely as a host for events or they can embed in the planning process by building community partnerships, developing themes for the event, or harnessing the expertise already present in the library staff. this article, focusing on years from 2014 to 2020, will highlight the history and evolution of hackathons in libraries as outreach events and as a focus for using library materials, data, workflows, and content. introduction as a means of introduction to hackathons for those unfamiliar with these events, the following definition was developed after reviewing the literature. hackathons are time-bound events where participants gather to build technology projects, learn from each other and experts, and create innovative solutions that are often judged for prizes. while hacking can have negative connotations when it comes to security vulnerabilities, typically for hackathon events hacking refers to modifying original lines of code or devices with the intent of creating a workable prototype or product. events may have a specific theme (use of a particular dataset or project based on a designated platform) or may be open-ended with challenges focused on innovation or social good. while hackathons have been a staple in software and hardware design for decades, the first hackathons with a library focus were sponsored by vendors, focused on topics such as accessibility and adaptive technology for their content and platforms.2 other industry hackathons focused on re-envisioning the role of the book in 2013 and 2014.3 as hackathons became more popular at colleges and universities, library participation evolved from content provider to event host. these partnerships were beneficial to libraries interested in shifting the perception of libraries from books to newer areas of expertise around data and information literacy. however, many libraries realized that by partnering in planning the events greater possibilities existed to educate participants about library content and staff expertise. some examples include working with public library communities to highlight text as data, having academic subject librarians work with departmental faculty to embed events within curriculum and assignments, and for both academic and public libraries to promote library-produced and publicly available datasets.4 information technology and libraries december 2021 hackathons and libraries |longmeier 2 there are many roles that libraries can take in these events. libraries can act as event hosts where they provide the space at a cost or for free.5 in other cases, library staff become collaborators and in addition to space may assist with planning logistics, judging, building partnerships, and have some staff present at the events.6 in public libraries this often includes building relationships with the city or specific segments of the community based on the theme of the event. on college campuses, it may be a partnership with a specific disciplines or campus it or an outside sponsor. in this way, the libraries are building and sustaining the event due to aligned priorities with the other partners. another option would be for the library to be the primary sponsor, where the library may provide prizes, the theme for the hackathon, as well as many of the items listed above.7 however, instead of specific categories, it should be viewed as a continuum of partnership and the amount of involvement with the event should align with the library’s priorities of what it hopes to accomplish through the event. how involved in event planning specific libraries want to be may depend on the depth of the existing partnerships as well as how many resources the library wants to commit to the event. libraries have always existed as curators and distributors of knowledge. some libraries are using hackathons to advance both their image and their practices. libraries are evolving into new roles and have grown to support more creative endeavors, such as the maker movement. this shift of libraries from book-provider to social facilitator and information co-creator aligns with hackathon events. the physical spaces themselves are ideal to support public outreach events and libraries are already providing makerspaces or similar services that would overlap with a hackathon audience.8 additionally, the spaces afforded by libraries allow flexibility and creativity to flourish, ideas to be exchanged, and different disciplines to mingle and co-produce. library staff focused on software development may have projects that would benefit from outside perspectives as well. in recent years libraries have become stewards of digital collections that can be used and reused in innovative ways. many libraries have chosen wikipedia edit-a-thons as a means of engaging with the public and enhancing access to materials.9 similarly, the collections-as-data movement is blossoming and allowing galleries, libraries, archives, and museum (glam) institutions to rethink the possible ways of interacting with collections. many public libraries are partnering with local or regional governments to build awareness of data sources and build bridges with the community around how they would like to interact with the data.10 additionally, as data science continues to grow in importance in both public and academic libraries, data fluency, data cleaning, and data visualization could be themes for a hackathon or data-thon.11 for those unfamiliar with these events, table 1 provides some generalized definitions created by the author of the different types of events and their intended purpose. for some organizations, there are ways to support these events that consume fewer resources or require less technical knowledge, such as an edit-a-thons or code jams. information technology and libraries december 2021 hackathons and libraries |longmeier 3 table 1. defining common hackathon and hackathon-like events, purpose, and typical size of events type of event definition purpose size of event hackathon a team-based sprint-like event focused on hardware or software that brings together programmers, graphic designers, interface designers, project managers or domain experts; can be open ended idea generation or for a specific provided theme build a working prototype, typically software up to 1,000 participants, usually determined by space available idea fest business pitch competition where individuals or teams pitch a solution or new company (startup) idea to a panel of judges deliver an elevator pitch for an idea, could be to secure funding <100 coding contest or code jam an individual or team competition to work through algorithmic puzzles or on specific code provided learning to code or solve challenges through coding; may produce a pitch at the end rather than a product 20–50 edit-athon an event where users improve content in a specific online community; can focus on a theme (art, country, city) or type of material (map) improving information in online communities such as wikipedia, openstreetmap, or localwiki 20–100 datathon a data-science–focused event where participants are given a dataset and a limited amount of time to test, build, and explore solutions usually a visualization or software development around a particular dataset 50–100 makeathon hardware focused hackathon build working prototype of hardware up to 300 participants methods to find articles in the library and information science literature related to hackathons and libraries, the author searched the association for computing machinery (acm) digital library, scopus, library literature and information science, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: (hackathon* or makeathon*) and library; in library literature and information science and lista databases, the most successful searches included: information technology and libraries december 2021 hackathons and libraries |longmeier 4 hackathon or makeathon or “coding contest.” the author also searched google scholar in an attempt to locate other studies or reports, some of which came from institutional repositories. while this search strategy was not meant to be exhaustive, it uncovered many articles about hackathons and libraries and others were found by chaining citations in the articles reference lists. based on search locales, international articles were found but only those where the text was available in english were included which meant that articles from asia, africa, and the global south may have been inadvertently overlooked. only two of the articles found in the search results were not held in library locations, did not use library/archival materials, or were not an outreach event where library staff were integral in planning (these were discarded.) findings the author grouped the literature into two categories: library as place and library as source. in the realm of library as place, the literature consisted of reporting on hackathons where the library was the host location for the event, those where the hackathon was an outreach event, and those where the hackathon was an extension of the libraries’ teaching/education mission. for most of these articles the majority were case studies and often shared tips for other libraries to consider when hosting a hackathon in library spaces. the second category, those that use library as source, focused on highlighting library spaces or services, workflows, or collections as the theme of the events. additionally, there were a few articles in the second category that discussed how to prepare or clean library data or library sources before the event to ensure that participants were able to use the materials during the time-bound event. in some cases where the source materials were from the libraries, the event also occurred in the library; thus, some articles fit into both categories and are highlighted in both sections. results: library as place the following summaries of hackathons and libraries as places for events will be grouped into two subgenres: library spaces and outreach events. libraries, both public and academic, are ideal locations for hosting large, technology-driven events given the usual amenities of ample parking, ubiquitous wi-fi, adequate outlets, and at times already having 24-hour spaces built into their infrastructure. more and more libraries are offering generous food and drink policies, a benefit as sustenance is a mainstay at these multiday events. additionally, libraries already host a number of outreach events and serve as a community information hub. using libraries as event hosts for hackathons a number of articles detail the use of library spaces to host hackathon events.12 the university of michigan library, a local hackerspace (all hands active), and the ann arbor district library teamed up to host a hackathon focused on oculus rift.13 this event grew out of a larger partnership with the community and sought to mix teams to include participants from all three areas. the 2018 article by demeter et al. highlights lessons learned from florida state university library and many of the planning steps involved when hosting large outreach events in library spaces.14 while the library initially hosted a 36-hour event, hackfsu, as a favor to the provost in the first year, they continue to host the event, providing library staff as mentors and logistical support. after the first year they started charging the student organization for use of the space and direct staffing costs for the hours beyond normal operating hours. while focused primarily on providing a central campus space, the library also sees it as a way to highlight the teaching and learning role of the library. similarly, nandi and mandernach detail the steps involved in planning information technology and libraries december 2021 hackathons and libraries |longmeier 5 hackathon events and some benefits of choosing the library as a location for the event.15 at ohio state, hackathon events in 2014 and 2015 were held in the library due to twenty-four-hour spaces, interest by the libraries in supporting innovative endeavors on campus, and a participant size (100–200 attendees) that could be accommodated in the space. other events chose academic libraries as locations for hackathons due to their central location on campus.16 an initial summary of library hackathons was captured by r. c. davis who detailed that libraries may be motivated to host such events as they align with library principles of “community, innovation, and outreach.”17 she points out that libraries are ideal locations because of small modular workspaces paired with a large space for final presentations. additionally, adequate and sufficiently strong wi-fi or hardwired connections, a multitude of power outlets, and 24-hour spaces are appealing for these kinds of events. event planners should know that the necessities include free food and multidisciplinary involvement. davis details ways to plan smaller events, such as code days or edit-a-thons, if staffing does not allow for a large hackathon event. in all cases, the libraries serve a purpose to either campus or community as the location and sometimes also provide staff for the events. hackathons as library outreach hackathon events are a great way to reach out to the community and provide a fresh look into libraries as purveyors of information focused on more than books. at the 2014 computers in libraries conference, chief library officer mary lee kennedy delivered a keynote sharing stories of the new york public libraries experiences hosting wikipedia editathons and other hackathons at various branches since 2014.18 the goals for these outreach events were to highlight strategic priorities around making knowledge accessible, re-examine the library purpose, and spark connections. early library hackathon events focused on outreach included topics of accessibility or designing library mobile apps.19 more recent events have focused on outreach but with an eye toward sharing content as part of the coding contest.20 even library associations have hosted preconference hacking events to highlight what libraries are doing to foster innovation.21 the future libraries product forge, a four-day event, was hosted in collaboration with the scottish library and information council and delivered by product forge, a company focused on running hackathons that tackle challenging social issues. the 2016 event focused specifically on public libraries in scotland and seven teams, comprised mainly of students from a local university, worked with public library staff and users as well as regional experts in technology, design, and business.22 the goals of the event were to raise awareness of digital innovation with library services, generate enthusiasm for approaches to digital service design, and codesign new services around digital ventures. participants created novel products including digital signage, a game for young readers, a tool for collecting user stories about library services, and an app to reserve specific library spaces. another common focus for library hackathon outreach events is the theme of data and data literacy. in july 2016, the los angeles public library hosted the civic information lab’s immigration hackathon.23 this outreach event gathered 100 participants to address local issues around immigration. the library, motivated by establishing itself as a “welcoming, trusting environment,” wanted to be a “prominent destination of immigrant and life-enrichment information and programs and services.”24 newcastle libraries ran two-day-long events focused on promoting data they released under an open license as part of the commons are forever project.25 they used both events to educate users about tools such as github, a gif-making session with historical photographs, and data visualization tools. similarly, toronto public library hosted a series of open data hackathons to highlight the role of the libraries in civic issue information technology and libraries december 2021 hackathons and libraries |longmeier 6 discourse, data literacy, and data education.26 their events combined the hackathon with other panel presentations and resources focused on mentorship and connection-building in the technology sector. the library also used the event to promote their open data policy, build awareness around the data provided by the library for the community, and highlight their role in facilitating conversations around civic issues through data literacy and data education. edmonton public library hosted its first hackathon in 2014 for international open data day. one of the main drivers was to build the relationship with their local government.27 they built their event around the tenets laid out in the open data hackathon how-to guide and by a blog post about the city of vancouver’s 2013 international open data day hackathon.28 they took a structured approach to documenting expectations of both partners around areas such as resources, staffing, and costs, which served as a roadmap for the hackathon and the partnership. the library provided the event space, coffee and pizza, an emcee, tech help and wi-fi, door prizes and “best idea” prize, and promotional material. the city recruited participants and provided an orientation, promotional banners, and a keynote. the event led to a deeper partnership with the city and additional hacking events. in these ways, the hackathon served a greater purpose of community building and awareness around data, the role the library plays in interpreting data, and how the libraries serve as a resource hub to the community. events supporting library teaching mission at academic institutions, the events often focus on outreach to their own campus community. in 2015, adelphi university hosted their first hackathon and the libraries funded the event themselves rather than seeking outside funding.29 the article details the considerable lessons learned through the process as well as a step-by-step guide to planning a smaller event. similarly, york university science and engineering library hosted hackfests in the library and embedded an event as part of an introductory computer science course.30 shujah highlighted some of the benefits to the library hosting a hackathon included: establishing libraries as part of the research landscape, providing a constructive space for innovation and innate collaborative environment, highlighting the commitment to openness and democratizing knowledge, and acknowledging the library’s role in boosting critical thinking and information literacy concepts. shin, vela, and evans highlight a community hackathon at washington state university college of medicine where a group of librarians from multiple institutions staffed a research station throughout the event.31 while the station was underutilized by participants, as only seven questions were asked during the event, the libraries deemed their participation a success as it worked as an outreach and promotion mechanism for both library services and expertise. at some public libraries, the focus of the hackathon is on education and teaching basic coding skills. whether called a coding contest, hackathon, or tech sandbox, there are opportunities for programming with a focus on learning and skill-building and fun.32 santa clara county library district used a peer-to-peer approach for mentoring and hosted a hackathon in 2015 for middle and high-school students.33 the library staff facilitated the event planning and recruited judges from the community, but the bulk of the event was coordinated by the students. considerations when hosting events in library spaces a couple of substantive reports provide overarching recommendations and considerations for hosting hackathons in library spaces, including planning checklists, tips on getting funding, building partnerships with local community officials, and thinking through the event systematically. recently, the digital public library of america (dpla) created a hackathon information technology and libraries december 2021 hackathons and libraries |longmeier 7 planning guide that details a number of logistical issues to address during the planning phases, both preand post-event.34 this report highlights specific considerations for galleries, libraries, archives and museums that are looking to host a hackathon. after hosting a successful hackathon, librarians at new york university created a libguide called hack your library which is a planning guide for other libraries considering hosting a similar event.35 the engage respond innovate final report: the value of hackathons in public libraries was put together following an event the carnegie uk trust sponsored.36 this guide highlights some of the challenges present with hackathons, including: intellectual property of the creations, prizes, participant diversity, and complications that arise from either approach of using specific themes or open-ended challenges. it also highlights some of the main reasons a library should consider hackathons and other coding events, including ways to promote new roles of libraries within communities, promote specific collections, capitalize on community expertise, gain insight about users, help users build new skills and improve digital literacy, and develop tools that increase access to materials. finally, the report points out that hosting an event will not be the only solution for a library’s innovation problem. yet if the library is clear on why it wants to hold a hackathon, being planful about expectations and outcomes the library is trying to achieve will increase the chances for success. results: library as source the other category of articles about hackathons and libraries focuses on the library as the source for the challenge or theme of the hackathon. the following summaries highlight articles include those where the libraries provided the challenges around library spaces or services, library datasets, workflows or collections as the theme for the hackathon. this section also details steps involved in cleaning data for use/re-use in time-bound events. using hackathons to improve library services and spaces a few articles discuss libraries that proposed hackathon themes around improving library services. a 2016 article describes how adelphi university libraries hosted a hackathon and provided the theme of developing library mobile apps and web software applications.37 the winning student team created an app for library group study meetups. similarly, the librarians from university of illinois tried three approaches for library app development: a student competition, a project in a computer science course, and a coding camp. with the adventure code camp, students co-designed with librarians over the course of two days.38 they advertised to specific departments and courses and ten students were selected with six ultimately participating in the two-day coding camp. students were sent a package of library data, available apis, and brief tutorials on coding languages that may be useful. mentors and coaches were available throughout the coding camp. the authors provided tips for others trying to replicate their approach as well as insights from the students about interest in developing apps that include library data but that don’t solely focus on library services. the following year the librarians hosted a coding contest focused specifically on app development related to library services and spaces.39 the library sponsored the event and served as both a traditional client and partner in the design process. ultimately six teams with a total of 26 individuals participated and each app was “required to address student needs for discovery of and access to information about library services, collections, and/or facilities” but not duplicate existing library mobile apps. they based their approach on massachusetts institute of technology’s entrepreneurship competition. through this process, co-ownership was preferred and many teams set up a licensing agreement as part of the competition to handle intellectual property for the software. students had two weeks to complete the apps and were judged by both library and campus it administration. this article details what information technology and libraries december 2021 hackathons and libraries |longmeier 8 they learned through the process given the amount of attrition from selection of teams to final product presentations. the new york university school of engineering worked with the libraries and used a hackathon theme of noise issues to coincide with the renovation of the library.40 the libraries created a libguide to provide structured information about the event itself (https://guides.nyu.edu/hackdibner). they used the event to market the new maker space and held workshops there leading up to the event. in the inaugural year they held the event over the course of two semesters and saw a lot of attrition due to the event length. in the second year, following focus groups with participants, they designed a library hackathon with four goals: 1) appeal to a large base of the student population, 2) create a triangle of engagement between the student and the library, the library and the faculty, and the faculty and the students, 3) provide an adaptable model to other libraries, and 4) highlight the development of student information literacy skills.41 the second year’s approach required more work by the participants due to pitching an initial concept, providing a written proposal, and giving a final presentation. library staff and guest speakers offered workshops to help students hone their skills. the planners evaluated the event through surveys and student focus groups. overall the students applied what they learned about information literacy and were highly engaged with the codesign approach to library service improvements. similarly, mcgowan highlights two hackathons at purdue that focused on inclusive healthcare and how the libraries applied design thinking processes as part of the events.42 the librarian wanted to encourage health sciences students to examine health data challenges. to examine this issue, she applied the blended librarians adapted addie model (blaam) as a guide to developing a service to prepare students to participate in a hackathon. a number of pre-event training sessions were held in the libraries and covered topics such as research data management, openrefine and data cleaning, gephi for data visualization, and javascript. while this initial approach was in tandem with the hackathon events, students reported that they needed assistance in finding and cleaning datasets for use. in this case, developing library services to prepare for hackathon events ended up out of alignment with both the library’s mission and the participants’ expectations. using library materials for hackathon themes several events have focused on library as source where the library’s materials or processes serve as the theme of the hackathon, particularly around digital humanities (dh) topics.43 in september 2016, over 100 participants worked with materials from the special collections of hamburg state and university library, a space that serves both the university and the public.44 it followed the process established by coding da vinci (https://codingdavinci.de/en), an event that occurred in 2014 and 2015. the event at hamburg state and university library had a kick-off day for sharing available datasets, brainstorming projects using library materials, and team building opportunities. the event had a second day of programming and then teams had six weeks to complete their projects. some exemplary products included a sticker printer that would print old photographs, a quiz app based on engraving plates, and using a social media platform to bring the engravings to the public. the event was successful and resulted in opening additional data from the institution. several examples focus on highlighting digital humanities approaches as part of the events. in 2016, four teams from across european institutions participated over five days in kibbutz lotan in the arava region of israel to develop linguistic tools for tibetan buddhist studies with the goal of information technology and libraries december 2021 hackathons and libraries |longmeier 9 revealing their collections to the public.45 the planning team recruited international scholars to participate in prestructured teams (teams consisted of computer scientists as well as a tibetan scholar) in israel. although it was less of a traditional hackathon, this event being more akin to an event/coding contest around a specific task, it highlighted tools and methods for understanding literary texts. the format of the event for encouraging interdisciplinary efforts in the computational humanities was deemed successful and it was repeated the next year on manuscripts and computer-vision approaches. recently the university of waterloo detailed a series of datathons using archives unleashed to engage the community in an open-source digital humanities project.46 the goal of the events was to engage dh practitioners with the web archive analysis tools and attempt to build a web archiving analysis community. in 2016, the american museum of natural history in new york hosted their third annual hackathon event, hack the stacks, with more than 100 participants.47 the event focused on creating innovative solutions for libraries or archives and to “animate, organize, and enable greater access to the increasing body of digitized content.” ten tasks were available for participants to work on and ranged from a unified search interface, reassembling fragments of scientific notebooks, and creating timelines of archival photos of the museum itself. in addition to planning the tasks, the library staff ensured that the databases and applications could handle the additional traffic. a multitude of platforms were provided (omeka, dspace, the catalog, apis, archivespace, etc) for hackers to use. all prototypes that were developed were open source and deposited on github at “hack the stacks.”48 some cultural institutions have used hackathons as a means of outreach and publicity and then have showcased the outputs at the museums. vhacks, a hackathon at the vatican, was held in 2018 and gathered 24 teams from 30 countries for a 36-hour event.49 the three themes for the event focused on social inclusion, interfaith dialogue, and migrants and refugees. a winner was announced for each thematic area and sponsors enticed participants to continue working on projects by having a venture capitalist pitch a few weeks after the event. another program, museomix, concentrates on a three-day rapid prototyping event where outputs are highlighted in the museum or cultural institution.50 this event has happened annually in november since 2011 and the goal is to create interdisciplinary networks and encourage innovation and community partnership. improving library workflows and processes other hackathons have focused on library staff working on library processes themselves. bergland, davis, and traill detail a two-day event, catdoc hack doc, hosted by the university of minnesota data management and access department focused on increasing documentation by library staff.51 this article details logistics of preparing for the event as well and a summary of the work completed. they based their approach on the islandora collaboration group’s template on how to run a hack/doc.52 they were pleased with the workflow overall, refined some of the steps, and held it again for library staff the following year. similarly, dunsire highlights using a hackathon format to encourage adoption of a cataloging approach of research description and access (rda) through a “jane-athon.”53 events occurred at library conferences or in conjunction with other hackathon events, such as the thing-athon at harvard, with the intention of promoting the use of rda, to help users understand the utility of rda, and to spark discussions. this approach proved useful in uncovering some limitations with rda as well as valuable feedback that could be incorporated into its ongoing development. information technology and libraries december 2021 hackathons and libraries |longmeier 10 considerations when using libraries as source if libraries are interested in hosting a hackathon where the library plays a more central role, there are several options of ready-to-use library and museum data that could allow the host to also serve as the content provider. the digital public library of america released a hackathon guide, glam hack-in-a-box: a short guide for helping you organize a glam hackathon with several sources at the end for finding data related to libraries.54 the university of glasgow began a project called the global history hackathons that seeks to improve access and excitement around global history research.55 additionally, candela et al. detail the new array of sources for sharing glam data for reuse in multiple ways, including using data in hackathon projects.56 planners could look to the collections-as-data conversations for other data sources that could be adapted for hackathon projects.57 when thinking about hackathons and cultural institutions, sustainability of projects and choice of platforms is an important consideration for planners.58 ultimately, the top priority when providing a dataset is to ensure that it is clean and enough details about the dataset are available for participants to make use of it in their designs given the time constraints of most events. discussion hackathons often have a dual purpose of educating the participants and serving as an advertisement for the sponsor for either a platform or content. participants will develop a working prototype or improving their coding abilities; sponsors, including libraries, can benefit from rapid prototyping and idea generation using either their platforms or content. while usable apps or new ideas are a welcome outcome, even if the applications are not used, the events still feed into the larger goal of marketing libraries and their data, building relationships with local communities, or drawing attention to social good. there are benefits to libraries in either hosting or collaborating on the events. in both areas, those of library as space and library as source, hackathons help realign user expectations of libraries. if libraries choose to become involved with hackathons or other coding or data contests, the library should be deliberate in its goals and intended outcomes as those will help shape both the event and its planning. libraries are naturally aligned with teaching and learning, are already offering co-curricular programming, and typically serve as physical and communication hubs for campus. libraries already prioritize outreach and engagement with constituents both on campuses and in the community. therefore, when programs align with library priorities of data literacy, data fluency, and information evaluation, it is a natural fit to propose involvement in hosting hackathons. many libraries are able to customize their spaces, services, and vendor interfaces, which is a benefit when thinking about having libraries as a theme for an event. other benefits exist for the hackathon event planners when partnering with a library. hackathon planners should consider reaching out to libraries as they already serve as a cross-disciplinary event spaces, host many other outreach events, and are often connected to other campus and community stakeholders and communication outlets. since students from all disciplines and colleges already use the library spaces on college campuses, they are an ideal location for fostering collaborations from different colleges and majors. public libraries function as community gathering spots as well. as libraries consider hosting events, several articles provide overarching tips for planning and hosting hackathons and other time-bound events.59 table 2 provides an overview of articles and the areas of coverage for planning topics. information technology and libraries december 2021 hackathons and libraries |longmeier 11 table 2. selected articles for tips on planning hackathon events based on common article theme areas article author location details sample agenda + timelines power and computing mentors/ judging further readings carruthers (2014) x x x x nelson & kashyap (2014) x x x x x jansen-dings, dijk, van westen (2017) x x x bogdanov & isaacmendard (2016) x x x nandi & mandernach (2016) x x x grant (2017) x x x x x as library data becomes more open and reusable, hackathons will be a way to highlight data availability, promote its use and reuse, and reach out to the community. the issues present when considering library collections as potential hackathon themes are that libraries will need to ensure the data are cleaned and contain sufficient metadata so that the data are ready to use. additionally, if there are programming language restrictions for ongoing maintenance by the library after the event, those should be specified when advertising the event. ultimately, the libraries will likely not control the intellectual property (ip) of the tool or visualization developed, but several libraries have specified the ultimate ip as part of the event details either as open source or co-owned.60 often the goal of the event is the promotion of specific materials or building awareness of a collection rather than any biproduct created during the event. however, it is important for the library to be clear about their intent when advertising to participants. the collections-as-data movement will continue to evolve and there will be a multitude of library resources that could be mined for use at hackathons or other similar events. while libraries provide an ideal location and have access to data that can be used for an event, they can also leverage their wealth of experts. library staff can serve as judges, mentors, and connectors to the wider campus or community. events could highlight specific expertise when hackathons focus on particular approaches (data visualization), processes (metadata management or documentation), or codesign of services (physical spaces). table 3 provides examples of hackathon events from a variety of library contexts. hackathons are a great way for libraries to serve as a connector to others on campus or in their communities. if libraries are not interested or able to host an event themselves, library staff can act as mentors or event judges. at smaller schools, library staff can partner with other campus units to plan a hackathon; similarly, smaller public libraries could work with community organizations to host events. at a smaller scale if staffing is a concern or full hackathons are unrealistic, a coding contest or datathon, both of which typically have a shorter duration, might be an option. edit-a-thons are even easier to host as they require only an introduction to the editing process, ample computer space (or laptop hook-ups), and a small food budget. some edit-a-thon events happen in a single afternoon. information technology and libraries december 2021 hackathons and libraries |longmeier 12 table 3. selected hackathon event summaries from various library contexts based on themes and products of the event article author type of library size of event time for event purpose of event role of the library output carruthers (2014) public + city 29 participants 1 day highlight open data from the libraries event space, coffee + pizza, emcee, some prizes, assessment building partnerships with the city, getting dataset requests ward, hahn, mestre (2015) academic 6 teams; 25 participants 2 weeks develop apps using library data event sponsor, mentor app development for library using library data mititelu & grosu (2016) academic 100 participants 48 hours bring together tech students event space app development for sponsors nandi & manderna ch (2016) academic 200 participants 36 hours bring together tech students event space, planning logistics, judges various apps, not library related baione (2017) private museum 100 participants 2 days animate, organize, and enable greater access to digitized content from the library create challenges, event space, judges open source apps for glam institutions theise (2017) academic + public 100 participants 2 days + 6 week sprint cultural hackathon to highlight library data and resources event space, challenges, datasets for hacking highlighted data available for use, created apps focused on library materials almogi et al. (2019) academic 23 participants 5 days develop linguistic tools for buddhist studies provided cleaned datasets for manipulation linguistic tools for buddhist studies one area for iteration around these events relates to timing. while most hackathons last 24–36 hours, some are run over the course of a oneor two-month period where coding happens information technology and libraries december 2021 hackathons and libraries |longmeier 13 remotely with a few scheduled check-ins with mentors before judging and presentations. this notion of a remote event may have more appeal for collections-as-data–themed events as experts are more likely to be available for keynotes or mentoring. if the process instead of the product is the focus of the event, then providing a flexible structure may be more appealing to participants. if a library has more limited resources or capacity, stretching the event out over a longer period would allow for sustained interactions. however, libraries should be aware that the longer the event period, the greater the attrition of the participants. an area for future research includes assessment of library participation in events. a couple of articles highlighted the value the libraries found in the events, but it is unclear whether the participants also gained value from the libraries.61 typically, post-event surveys have focused on the participant experience or the overall event space, rather than whether it affected participants’ view of the libraries, which would another area of interest for future research.62 conclusion in the realm of hackathons and libraries, originally hackathon themes were a way that vendors could highlight new content or improve interfaces. libraries followed this trend and used events to reach out to constituents, make connections with their communities, and highlight evolving library services. with the growth of flexible spaces, ample technology support and more relaxed food policies, libraries have become ideal event locations. as the collections-as-data movement evolves, there will be more opportunities to develop services related to these data and other library data which would lend themselves easily as themes for hackathons, edit-a-thons, or datathons. libraries thinking about hosting events will need to weigh the amount of time and resources they want to invest with the intended goals of hosting an event. planning is essential whether the library is the event host, a collaborator, or a sponsor of a hackathon. for those libraries that are unable to host a full hackathon, smaller events, such as a datathon or edit-a-thon, are possibilities to provide support without the same time and resource commitment. given the growing popularity of hackathons and other coding contests, they may be a catch-all for solving several library issues simultaneously: updating the library’s image as being more than book-centric, supporting the collections-as-data movement, and a new way of engaging community partners. acknowledgements thank you to jody condit fagan for providing valuable suggestions on a draft of this paper and to the two anonymous reviewers whose feedback improved the quality of this manuscript. endnotes 1 james k. elmborg, “libraries as the spaces between us: recognizing and valuing the third space,” reference and user services quarterly 50, no. 4 (2011): 338–50. 2 “a brief open source timeline: roots of the movement,” online searcher 39, no. 5 (2015): 44–45; patrick timony, “accessibility and the maker movement: a case study of the adaptive technology program at district of columbia public library,” in accessibility for persons with disabilities and the inclusive future of libraries, advances in librarianship, vol. 40, (emerald group publishing limited, 2015), 51–58; kurt schiller, “elsevier challenges library community,” information today 28, no. 7 (july 2011): 10; eric lease morgan, “worldcat information technology and libraries december 2021 hackathons and libraries |longmeier 14 hackathon,” infomotions mini-musings (blog), last modified november 9, 2008, http://infomotions.com/blog/2008/11/worldcat-hackathon/; margaret heller, “creating quick solutions and having fun: the joy of hackathons,” acrl techconnect (blog), last modified july 23, 2012, http://acrl.ala.org/techconnect/post/creating-quick-solutions-andhaving-fun-the-joy-of-hackathons. 3 clemens neudecker, “working together to improve text digitization techniques: 2nd succeed hackathon at the university of alicante,” impact centre of confidence in digitisation blog, last updated april 22, 2014, https://www.digitisation.eu/succeed-2nd-hackathon/; porter anderson, “futurebook hack,” bookseller no. 5628 (june 20, 2014): 20–21; sarah shaffi, “inaugural hack crowns its diamond project,” bookseller no. 5628 (june 20, 2014): 18–19. 4 rose sliger krause, james rosenzweig, and paul victor jr. “out of the vault: developing a wikipedia edit-a-thon to enhance public programming for university archives and special collections,” journal of western archives 8, no. 1 (2017): 3; stanislav bogdanov and rachel isaac-menard, “hack the library: organizing aldelphi [sic] university libraries’ first hackathon,” college and research libraries news 77, no. 4 (2016): 180–83; matt enis, “civic data partnerships,” library journal 145, no. 1 (2020): 26–28; alex carruthers, “open data day hackathon 2014 at edmonton public library,” partnership: the canadian journal of library & information practice & research 9 no. 2 (2014): 1–13, https://doi.org/10.21083/partnership.v9i2.3121; sarah shujah, “organizing and embedding a library hackfest into a 1st year course,” information outlook 18, no. 5 (2014): 32–48; lindsay anderberg, matthew frenkel, and mikolaj wilk, “project shhh! a library design contest for engineering students,” in american society for engineering education 2018 annual conference proceedings (2018): paper id 21058, https://cms.jee.org/30900. 5 michelle demeter et al., “send in the crowds: planning and benefiting from large-scale academic library events,” marketing libraries journal 2 no. 1 (2018): 86–95, https://bearworks.missouristate.edu/cgi/viewcontent.cgi?article=1089&context=articles-lib. 6 jamie lausch vander broek and emily puckett rodgers, “better together: responsive community programming at the um library,” journal of library administration 55, no. 2 (2015): 131–41; arnab nandi and meris mandernach, “hackathons as an informal learning platform,” in sigcse 2016 – proceedings of the 47th acm technical symposium on computing science education (february 2016): 346–51, https://doi.org/10.1145/2839509.2844590; lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: engage students in information literacy through a technology-themed competition,” in american society for engineering education 2019 annual conference proceedings, (2019): paper id 26221, https://peer.asee.org/32883; anna grant, hackathons: a practical guide, insights from the future libraries project forge hackathon (carnegieuk trust, 2017), https://www.carnegieuktrust.org.uk/publications/hackathons-practical-guide/; carruthers, “open data day hackathon 2014 at edmonton public library”; chad nelson and nabil kashyap, glam hack-in-a-box: a short guide for helping you organize a glam hackathon (digital public library of america, summer 2014), http://dpla.wpengine.com/wpcontent/uploads/2018/01/dpla_hackathonguide_forcommunityreps_9-4-14-1.pdf. information technology and libraries december 2021 hackathons and libraries |longmeier 15 7 david ward, james hahn, and lori mestre, “adventure code camp: library mobile design in the backcountry,” information technology and libraries 33, no. 3 (2014): 45–52; david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces 4, no. 1 (2015): 30–40. 8 ann marie l. davis, “current trends and goals in the development of makerspaces at new england college and research libraries,” information technology and libraries 37, no. 2 (2018): 94–117, https://doi.org/10.6017/ital.v37i2.9825; mark bieraugel and stern neill, “ascending bloom’s pyramid: fostering student creativity and innovation in academic library spaces,” college & research libraries 78, no. 1 (2017): 35–52; elyssa kroski, the makerspace librarian’s sourcebook (chicago: ala editions, 2017); angela pashia, “empty bowls in the library: makerspaces meet service,” college & research libraries news 76 no. 2 (2015): 79–82; h. michele moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93; adetoun a. oyelude, “virtual reality (vr) and augmented reality (ar) in libraries and museums,” library hi tech news 35, no. 5 (2018) 1–4. 9 krause, rosenzweig, and victor jr., “out of the vault”; ed yong, “edit-a-thon gets women scientists into wikipedia,” nature news (october 22, 2012), https://doi.org/10.1038/nature.2012.11636; angela l. pratesi et al., “rod library art+feminism wikipedia edit-a-thon,” community engagement celebration day (2018): 10, https://scholarworks.uni.edu/communityday/2018/all/10; maitrayee ghosh, “hack the library! a first timer’s look at the 29th computers in libraries conference in washington, dc,” library hi tech news 31, no. 5 (2014): 1–4, https://doi.org/10.1108/lhtn-05-20140031. 10 carruthers, “open data day hackathon 2014 at edmonton public library”; bob warburton, “civic center,” library journal 141, no. 15 (2016): 38. 11 matt burton et al., shifting to data savvy: the future of data science in libraries (project report, university of pittsburgh, pittsburgh, pa, 2018): 1–24, https://d-scholarship.pitt.edu/33891/. 12 vander broek and rodgers, “better together”; nandi and mandernach, “hackathons as an informal learning platform”; robin camille davis, “hackathons for libraries and librarians,” behavioral & social sciences librarian 35, no. 2 (2016): 87–91; bogdanov and isaac-menard, “hack the library”; ward, hahn, and mestre, “adventure code camp”; ward, hahn, and mestre, “designing mobile technology to enhance library space use”; demeter et al., “send in the crowds”; carruthers, “open data day hackathon 2014 at edmonton public library.” 13 vander broek and rodgers, “better together.” 14 demeter et al., “send in the crowds.” 15 nandi and mandernach, “hackathons as an informal learning platform.” 16 eduard mititelu and vlad-alexandru grosu, “hackathon event at the university politehnica of bucharest,” international journal of information security & cybercrime 6, no. 1 (2017): 97–98; information technology and libraries december 2021 hackathons and libraries |longmeier 16 orna almogi et al., “a hackathon for classical tibetan,” journal of data mining and digital humanities, episciences.org, special issue on computer-aided processing of intertextuality in ancient languages, hal-01371751v3 (2019): 1–10, https://jdmdh.episciences.org/5058/pdf. 17 davis, “hackathons for libraries and librarians.” 18 ghosh, “hack the library!” 19 timony, “accessibility and the maker movement”; ward, hahn, and mestre, “adventure code camp.” 20 gérald estadieu and carlos sena caires, “hacking: toward a creative methodology for cultural institutions,” (presented at the viii lisbon summer school for the study of culture “cuber+cipher+culture”, september 2017); andrea valdez, “the vatican hosts a hackathon,” wired magazine, last updated march 7, 2018, https://www.wired.com/story/vaticanhackathon-2018/; leonardo moura de araujo, “hacking cultural heritage: the hackathon as a method for heritage interpretation,” (phd diss., university of bremen, 2018): 181–231, 235– 38. 21 thomas finley, “innovation lab: a conference highlight,” texas library journal 94, no. 2 (summer 2018): 61–62. 22 grant, hackathons: a practical guide. 23 warburton, “civic center.” 24 warburton, “civic center.” 25 aude charillon and luke burton, “engaging citizens with data the belongs to them,” cilip update magazine (november 2016). 26 enis, “civic data partnerships.” 27 carruthers, “open data day hackathon 2014 at edmonton public library.” 28 kevin mcarthur, herb lainchbury, and donna horn, “open data hackathon how to guide v. 1.0,” october 2012, https://docs.google.com/document/d/1fbuisdtiibaz9u2tr7sgv6gddlov_ahbafjqhxsknb0/e dit?pli=1; david eaves, “open data day 2013 in vancouver,” eaves.ca (blog), march 11, 2013, https://eaves.ca/2013/03/11/open-data-day-2013-in-vancouver/. 29 bogdanov and isaac-menard, “hack the library.” 30 shujah, “organizing and embedding a library hackfest into a 1st year course.” 31 nancy shin, kathryn vela, and kelly evans, “the research role of the librarian at a community health hackathon—a technical report,” journal of medical systems 44 (2020): 36. 32 geri diorio, “programming by the book,” voices of youth advocates 35, no. 4, (2012): 326–327. information technology and libraries december 2021 hackathons and libraries |longmeier 17 33 lauren barack and matt enis, “where teens teach,” school library journal (april 2016): 30. 34 nelson and kashyap, glam hack-in-a-box. 35 lindsay anderberg, matthew frenkel, and mikolaj wilk, “hack your library: a library competition toolkit,” june 6, 2019, https://wp.nyu.edu/hackyourlibrary/; anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technologythemed competition.” 36 anna grant, engage. respond. innovate. the value of hackathons in public libraries (carnegieuk trust, 2020), https://www.carnegieuktrust.org.uk/publications/engage-respond-innovatethe-value-of-hackathons-in-public-libraries/. 37 bogdanov and isaac-menard. “hack the library.” 38 ward, hahn, and mestre, “adventure code camp.” 39 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” 40 anderberg, frenkel, and wilk, “project shhh!” 41 anderberg, frenkel, and wilk, “hack your library: engage students in information literacy through a technology-themed competition.” 42 bethany mcgowan, “the role of the university library in creating inclusive healthcare hackathons: a case study with design-thinking processes,” international federation of library associations and institutions 45, no. 3 (2019): 246–53, https://doi.org/10.1177/0340035219854214. 43 marco büchler et al., “digital humanities hackathon on text re-use ‘don’t leave your data problems at home!’” electronic text reuse acquisition project, event held july 27–31, 2015, http://www.etrap.eu/tutorials/2015-goettingen/; helsinki centre for digital humanities, “helsinki digital humanities hackathon 2017 #dhh17,” event held may 15–19, 2017, https://www.helsinki.fi/en/helsinki-centre-for-digital-humanities/dhh-hackathon/helsinkidigital-humanities-hackathon-2017-dhh17. 44 antje theise, “open cultural data hackathon coding da vinci–bring the digital commons to life,” in ifla wlic 2017 wroclaw poland, session 231—rare books and special collections (2017), http://library.ifla.org/id/eprint/1785. 45 almogi et al., “a hackathon for classical tibetan.” 46 samantha fritz et al., “fostering community engagement through datathon events: the archives unleased experience,” digital humanities quarterly 15, no. 1 (2021): 1–13, http://digitalhumanities.org/dhq/vol/15/1/000536/000536.html. 47 tom baione, “hackathon & 21st-century challenges.” library journal 142, no. 2 (2017): 14–17. information technology and libraries december 2021 hackathons and libraries |longmeier 18 48 american museum of natural history, “hack the stacks,” https://www.amnh.org/learnteach/adults/hackathon/hack-the-stacks, https://github.com/amnh/hackthestacks/wiki, https://github.com/hackthestacks. 49 andrea valdez, “inside the vatican’s first-ever hackathon: this is the holy see of the 21st century,” wired magazine, march 12, 2018, https://www.wired.com/story/inside-vhacksfirst-ever-vatican-hackathon/. 50 museomix, “concept,” accessed march, 29, 2021, https://www.museomix.org/en/concept/. 51 kristi bergland, kalan knudson davis, and stacie traill, “catdoc hackdoc: tools and processes for managing documentation lifecycle, workflows, and accessibility,” cataloging and classification quarterly 57, no. 7–8 (2019): 463–95. 52 islandora collaboration group, “templates: how to run a hack/doc,” last modified december 5, 2017, https://github.com/islandora-collaborationgroup/icg_information/tree/master/templates_how_to_run_a_hack_doc. 53 gordon dunsire, “toward an internationalization of rda management and development,” italian journal of library and information science 7, no. 2 (may 2016): 308–31. http://dx.doi.org/10.4403/jlis.it-11708 54 nelson and kashyap, glam hack-in-a-box. 55 hannah-louise clark, “global history hackathons information,” accessed april 19, 2021, https://www.gla.ac.uk/schools/socialpolitical/research/economicsocialhistory/projects/glob al%20historyhackathons/history%20hackathons/. 56 gustavo candela et al., “reusing digital collections from glam institutions,” journal of information science (august 2020): 1–10, https://doi.org/10.1177/0165551520950246. 57 thomas padilla, “on a collections as data imperative,” uc santa barbara, 2017, https://escholarship.org/uc/item/9881c8sv; rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101; sandra tuppen, stephen rose, and loukia drosopoulou, “library catalogue records as a research resource: introducing ‘a big data history of music,’” fontes artis musicae 63, no. 2 (2016): 67–88. 58 moura de araujo, “hacking cultural heritage.” 59 grant, hackathons: a practical guide; grant, engage. respond. innovate.; joshua tauberer, “hackathon guide,” accessed march 26, 2021, https://hackathon.guide/; alexander nolte et al., “how to organize a hackathon—a planning kit,” arxiv preprint arxiv:2008.08025 (2020), https://arxiv.org/abs/2008.08025v2; ivonne jansen-dings, dick van dijk, and robin van westen, hacking culture: a how-to guide for hackathons in the cultural sector, waag society, (2017): 1–41. https://waag.org/sites/waag/files/media/publicaties/es-hacking-culturesingle-pages-print.pdf. 60 ward, hahn, and mestre, “designing mobile technology to enhance library space use.” information technology and libraries december 2021 hackathons and libraries |longmeier 19 61 mcgowan, “the role of the university library in creating inclusive healthcare hackathons.” 62 nandi and mandernach, “hackathons as an informal learning platform”; carruthers, “open data day hackathon 2014 at edmonton public library.” reference chatbots in canadian academic libraries article reference chatbots in canadian academic libraries julia guy, paul r. pival, carla j. lewis, and kim groome information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16511 about the authors julia guy (corresponding author: julia.guy@ucalgary.ca) is the digital projects librarian, gis, university of calgary. paul r. pival (ppival@ucalgary.ca) is the research librarian data analytics, university of calgary. carla j. lewis (carla.lewis1@ucalgary.ca) is the learning support librarian, university of calgary. kim groome (kim.groome@ucalgary.ca) is an information specialist, university of calgary. © 2023. submitted: may 16, 2023. accepted for publication: october 12, 2023. published 18 december 2023. abstract chatbots are “computer agents that can interact with the user” in a way that feels like human-tohuman conversation.1 while the use of chatbots for reference service in academic libraries is a topic of interest for both library professionals and researchers, little is known about how they are used in library reference service, especially in academic libraries in canada. this article aims to fill this gap by conducting a web-based survey of 106 academic library websites in canada and analyzing the prevalence and characteristics of chatbot and live chat services offered by these libraries. the authors found that only two libraries were using chatbots for reference service. for live chat services, the authors found that 78 libraries provided this service. the article discusses possible reasons for the low adoption of chatbots in academic libraries, such as accessibility, privacy, cost, and professional identity issues. the article also provides a case study of the authors’ institution, the university of calgary, which integrated a chatbot service in 2021. the article concludes with suggestions for future research on chatbot use in libraries. introduction with the recent launch of artificial intelligence (ai) chatbots (e.g., chatgpt and bing-chat), and springshare’s introduction of a library-focused chatbot product, it seems probable that more libraries will adopt chatbot technology to provide reference services 24/7.2 this article attempts to document these changes by surveying academic libraries in canada to determine the current popularity of chatbot use in reference service. at this time, most of the chatbots used in the library market are essentially interactive faqs, where keywords entered by the user return suggestions to pages that may answer the query. however, with the recent emergence of advanced tools, domainspecific enhancements to chatbots leveraging artificial intelligence could cause an explosion of ai chatbot use in the library world. literature review this research focuses on the use of chatbots by academic libraries in canada. there are many different terms for chatbots (e.g., digital assistants, conversational agents, etc.), and these terms are used inconsistently in the media and across the literature. 3 a chatbot can be defined as a system that simulates a conversation with a human, in real time, using ai technology.4 chatbots respond to user interactions in sentences “that track the conversation in a meaningful way to humans.”5 although both involve user interactions in a digital space, chatbots are not to be confused with live chat services, which is when a user can converse with a library staff member in a synchronous online session. mailto:julia.guy@ucalgary.ca mailto:ppival@ucalgary.ca mailto:carla.lewis1@ucalgary.ca mailto:kim.groome@ucalgary.ca information technology and libraries december 2023 reference chatbots in canadian academic libraries 2 guy, pival, lewis, and groome notably, chatbots use natural language processing (nlp) to communicate with users. broadly speaking, nlp is the “use of computer technology to assist in or complete tasks involving the processing, categorizing, analyzing, or interpreting the meaning of human language.”6 in the context of chatbots, nlp has two main functions, natural language understanding (nlu) to interpret inputs and natural language generation (nlg) to produce a language response.7 “natural language” in this context refers generally to human language, spoken or written, although there is research and advocacy around the further inclusion of signed languages in nlp research and development, including a recent paper by yin et al.8 furthermore, nlp also eliminates the use of boolean operators, traditionally a requirement of library interfaces. an additional benefit of nlp technologies is that they are intuitively usable for many people, given that users communicate with these systems in the way that humans are used to communicating, through conversation. due to this minimal barrier to entry for many users, chatbots are increasingly being implemented as online assistants in many different contexts .9 the increasing popularity of nlp technology also means that when users encounter a chatbot in a digital environment, they are likely familiar with how to interact with it. chatbots are therefore a “costeffective and accessible supplement to manual customer service.”10 with ai technology on the rise, researchers are exploring the implications of integrating these technologies into academic libraries, and several literature reviews have been conducted on this topic over the last few years.11 noteworthy benefits of chatbots include the potential to free up staff time and the ability for these systems to assist students remotely, 24/7.12 despite the growing interest in using ai in academic librarianship, as tait and pierson observed, much of the existing literature on library applications of ai is predictive and speculates on what the use of ai might look like for libraries in the future.13 for example, research has explored stakeholder’s perception of risks associated with library adoption of chatbots in the future as well as the potential benefits of chatbots for millennial users in particular.14 although there is no shortage of theoretical research, there is very limited literature assessing the extent to which chatbot technologies are currently being employed by academic libraries for reference services, particularly in canada. case studies have explored the introduction of chatbots into specific american academic libraries, as well as public libraries, but there do not seem to be comparable studies published for canadian post-secondary institutions.15 although research has examined major canadian research libraries’ incorporation of ai in other areas, such as institutional strategic plans and library programming, research is lacking in the field of library adoption of chatbots for reference or customer service purposes in canada.16 in this way, research on chatbot use in libraries might be lagging behind research investigating the current use of other ai applications in academic libraries. topics such as the use of ai to analyze transcripts of communication between librarians and users have been examined.17 with recent advances in nlp, such as the launch of chatgpt, which could have enormous impacts on education overall, it is a good time to shift focus from theorizing about the effects of chatbots in library spaces and begin assessing their use and impact.18 this is especially important given the significant professional and ethical implications these technologies may have for academic librarianship.19 information technology and libraries december 2023 reference chatbots in canadian academic libraries 3 guy, pival, lewis, and groome methods based on the literature review, we identified two key research questions: 1. to what extent, if any, are academic libraries in canada employing chatbots for reference services? 2. what can be determined about the use of chatbots for reference service in academic libraries in canada? an exploratory data analysis approach was used to address the above research questions. between january and february 2023, the authors visited 106 academic library websites. libraries were selected by consulting the canadian library directory and filtering for university libraries.20 information was collected from each library website around the use of chatbots and live chat services offered by each library and recorded in a spreadsheet. to determine whether a chatbot service was offered, the authors began by visually searching the library homepage for a chat text box or icon. if a chat service was not immediately obvious, the authors looked for a webpage outlining library services or equivalent page. if a chat option was visible, the authors interacted with each chat service widget to see if it indicated the use of ai or a human on the other end. if unclear, we asked via the text box. when a chatbot service was offered, we determined the software vendor by viewing the source code of the website. the vendor or platform used to provide live chat service was also recorded. in addition to determining what kind of online reference services were offered, we also recorded information regarding student population and geographic location of the institution, as well as further details around the chatbot implementation, including how the chatbot is introduced, its name, how long it had been in use (if available), and whether the chatbot engages in user initiated dialogue (uid), in which the interaction begins with a user question, or system directed dialogue (sdd), in which a system prompts a user for information.21 in examining whether a chatbot was used elsewhere on campus, we determined that it was unrealistic to look at every page on the university’s website. instead, we visited each institution’s homepage and admissions page, which are common webpages for each institutional website and which have the traffic and volume of user questions to justify a chatbot. when we reviewed the results of our initial scan, we were surprised by the low occurrences of chatbots on academic library websites. collecting and analyzing publicly available data on library websites allowed us to efficiently gather data to create an accurate snapshot of chatbot use for a particular time window. this method of data collection does have some limitations, however, including that we do not have information about whether a chatbot was previously used in an academic library and then discontinued, or insights into the reasoning behind a library ’s decision to use or not use a chatbot for reference. the flexibility of an exploratory approach allowed us to address the later limitation by including a brief case study outlining the reasoning and process used by the authors’ institution, the university of calgary, to integrate a chatbot service. information delivered in this section was gathered from institutional documentation and authors’ firsthand experience. limitations of this approach are that a single case study is being used to explore this process and that biases may exist because of the authors close relationship to the work.22 despite these limitations, the use of a single case study can be appropriate when researchers can provide unique insights.23 therefore, we will include these insights in a case study section to communicate the process taken to launch a chatbot, in the hope that it can supplement our exploration of this topic and be valuable for other researchers. information technology and libraries december 2023 reference chatbots in canadian academic libraries 4 guy, pival, lewis, and groome findings reference chatbots after searching canadian post-secondary library websites for evidence of chatbot reference, we determined that only two out of 106 libraries (1.89%) were currently using a chatbot reference service. both institutions, the university of calgary and mount royal university, are located in calgary, alberta. in both cases, the chatbot was created by ivy.ai, which according to the company is “the leading provider of conversational, artificially intelligent chatbots for higher education.”24 at the university of calgary, ivy.ai chatbots were also used by units on campus other than the library, such as the registrar. during our assessment, we looked for trends in the data that might explain why only two academic libraries are currently using a chatbot. the most obvious similarity is that both institutions are in the same city. the proximity between these two institutions likely means greater potential for the sharing of professional practice ideas at localized professional conferences or through personal connections. this proximity may have impacted the clustering of chatbot use in this one city. the fact that both institutions use ivy.ai is another similarity which might be explained by nearby libraries having a greater awareness of what the other institution is doing, although this is speculation. this trend might suggest that adoption of chatbots may accelerate once p recedent has been established and awareness of successful chatbot integration spreads to other institutions. further research could involve interviewing academic library leadership to compare what factors they consider when it comes to whether or not to integrate chatbot services. given the increasing popularity of chatbots, it is surprising that so few academic libraries in canada are employing chatbots for reference services. the literature offers a few possible explanations for this. according to kaushal and yadav, risks associated with accessibility for users, system restrictions, and privacy have slowed the adoption of chatbot technologies in academic libraries.25 we can speculate as to other contributing factors, including lack of familiarity with these technologies among library staff, cost, insufficient need for reference support due to a manageable volume of questions, or that ai companies may not have sought out academic libraries as potential customers in a significant way. the relatively slow adoption of ai for reference service can also be compared to the library profession’s initial hesitancy to incorporate internet searching in library reference. nelson and irwin provided a thorough breakdown of the impacts of the rise of internet search on the occupational identity of librarians.26 the authors determined that there can be hesitancy to adopt new technology when a profession has a well-established identity, a technology emerges designed to perform a task that overlaps with that identity, and the method used by that technology contradicts the general idea of how a task should be done by members of that profession. 27 when these three factors combine, the professional’s “mastery of the existing approach encourages them to devalue solutions that do not match this approach.”28 according to nelson and irwin, in the early days of the internet, librarians were very motivated by “the belief that internet search was not the best way to help patrons.”29 delays in chatbot adoption may be motivated by similar feelings now. library staff may understandably feel that current practices meet the needs of patrons better than these technologies currently can. live chat service out of 106 academic libraries studied, 78 (73.58%) offered a live chat service, connecting users with library staff online. we found that some libraries give users the option to text questions, but we did not consider this live chat service because it is unclear whether responses to texts happen live or if library staff respond to those queries later, as they might with an emailed question. of the information technology and libraries december 2023 reference chatbots in canadian academic libraries 5 guy, pival, lewis, and groome 78 libraries offering live agent chat reference, there were three predominant platforms in use. forty-three (53.85%) of the libraries that offered live chat used some version of libraryh3lp, either individually or in a consortial environment; 18 (21.79%) used libchat from springshare; and 12 (15.38%) used askaway, a branded service for libraries in the british columbian electronic library network (supported by the libraryh3lp web client). less common live chat platforms were tawk.to, zendesk, facebook messenger, skylerai live!, and ivy.ai (ivy.ai can be used for both live chat and chatbot services), which were used by one institution each. these findings suggest that the majority of academic libraries in canada are offering live chat as a remote reference service. although we do not have data on how popular this approach was before the onset of the covid-19 pandemic, we can assume that the transition to off-campus learning made this service particularly important for many institutions. one of the main benefits of both live chat services and chatbots is that students can access them from anywhere they have an internet connection. it is possible that live chat services meet the user's need for remote reference support and, thus, institutions do not feel compelled to introduce a chatbot at this time. further research could explore the relationship between live chat services and the covid-19 pandemic in greater detail, including how it has impacted decisions around service delivery. integrating or separating live agent chat and chatbot reference services the two libraries studied that provided both a chatbot and live chat service exemplify two different approaches to providing these services. in the case of the university of calgary, two different platforms were used to provide the chatbot and live agent chat services. the chat with library staff is provided through springshare’s libchat platform, and the chatbot is an ivy.ai system. in contrast, at mount royal university, a single ivy.ai system is used for both, with the chatbot responding after hours and a live chat agent responding the rest of the time. interviewing staff at these institutions to determine their reasoning for integrating or not integrating these two services is beyond the scope of this paper. we can, however, speculate as to possible factors that contribute to these decisions. for example, libraries may want separate systems to very clearly distinguish which service is being delivered, so users understand whether they are communicating with a library staff member or a chatbot. separate systems also give users the option of communicating with a chatbot rather than a human, which previous research suggests may be preferable for some users with relatively straightforward questions.30 a potential benefit of using an integrated system is that the pathway to getting library assistance is always the same for users, and the conversation can be continuous even after it is transferred to a human. case study in previous sections, we discussed factors that may contribute to an academic library ’s decision to integrate or not integrate a chatbot on their library website. to pursue this analysis further, the following case study will provide an example of the authors’ institution’s reasoning and processes for offering a chatbot as part of their reference services. the incorporation of chatbot technology was a logical next step in the evolution of library service for the university of calgary library. having offered live chat service since 2010, proactive live chat was introduced in november 2018, using a pop-up that invited users to talk to a staff member via instant message when they lingered on a library webpage for more than 30 seconds. this immediately tripled the number of incoming queries. upon closer analysis of the live chat transcripts, there was a high demand for chat service outside of regular service hours. additionally, a significant number of questions were frequently asked, for example, basic reference information technology and libraries december 2023 reference chatbots in canadian academic libraries 6 guy, pival, lewis, and groome and directional questions such as “what is a peer reviewed article?,” “what are your hours?,” or “how do i book a workroom?” during this period, the library’s associate university librarian of technology was investigating various methods and technologies to support the staff answering live chat queries. discussions with university of calgary it revealed that the registrar’s office was developing a request for proposal for a chatbot and the library was able to take advantage of this same initiative in january 2021. a few months later, a team of library staff was formed to work with the software vendor. the library website was scraped for training data, and the vendor worked closely with the library team to test and calibrate the chatbot. in august 2021, the library went live with taylor rex (t-rex for short), an eponym for the taylor family digital library. by taking the initial request and triaging the information, the chatbot provided directional and basic reference support and helped library chat operators manage the number of incoming queries. in this example, the university of calgary’s decision to introduce a chatbot came out of a need to answer faqs, meet user needs outside of regular operating hours, and assist staff with a large volume of questions. with a similar approach being pursued elsewhere on campus, timing and institutional support helped move the initiative forward. since introducing the chatbot, live chat statistics have dropped significantly at the university of calgary and striking a balance between chatbot vs. live chat entrance points across the library’s web properties is an ongoing priority. conclusion and next steps our findings suggest that few academic libraries in canada were using chatbot technologies to provide reference services at the beginning of 2023. anecdotally, the authors are aware of at least one academic library declaring itself a “bot-free zone” on social media. while it seems very likely that more libraries will adopt this technology in the near future, there may be a backlash against it, and a renewed entrenchment of personalized human services. this research provides valuable insights into the current state of ai-assisted chatbots in academic libraries in canada. this information will be useful for library professionals considering implementing this technology in their institutions and for researchers studying the use of ai in library services. suggested research areas for the future include similar analysis in different regions and library types, ethical and quality considerations in the use of chatbots for reference service, and professional considerations and implications for these technologies. endnotes 1 ferliana dwitama and andre rusli, “user stories collection via interactive chatbot to support requirements gathering,” telkomnika telecommunication, computing, electronics and control 18, no. 2 (2020): 890, https://doi.org/10.12928/telkomnika.v18i2.14866. 2 talia richards, “springshare announces libanswers chatbot,” springshare, february 15, 2023, https://blog.springshare.com/2023/02/15/springshare-announces-libanswers-chatbot/. 3 michael mctear, conversational ai: dialogue systems, conversational agents, and chatbots (switzerland: springer, 2020), 12–13. 4 “the power of chatbots explained,” expert.ai, march 24, 2022, https://www.expert.ai/blog/chatbot/. https://doi.org/10.12928/telkomnika.v18i2.14866 https://blog.springshare.com/2023/02/15/springshare-announces-libanswers-chatbot/ https://www.expert.ai/blog/chatbot/ information technology and libraries december 2023 reference chatbots in canadian academic libraries 7 guy, pival, lewis, and groome 5 deeann allison, “chatbots in the library: is it time?” library hi tech 30, no. 1 (2012): 95, https://doi.org/10.1108/07378831211213238. 6 patrick rafail and isaac freitas, “natural language processing,” in foundations, ed. paul atkinson et al. (london: sage publications, 2020), https://doi.org/10.4135/9781526421036879118. 7 sushree bibhuprada b. priyadarshini, amiya bhusan bagjadab, and brojo kishore mishra, “a brief overview of natural language processing and artificial intelligence,” in natural language processing in artificial intelligence, ed. brojo kishore mishra (burlington, on: apple academic press, 2020), 212, https://doi.org/10.1201/9780367808495. 8 kayo yin et al., “including signed languages in natural language processing,” arxiv preprint, arxiv, https://doi.org/10.48550/arxiv.2105.05222. 9 mctear, conversational ai, 21. 10 knut kvale et al., “understanding the user experience of customer service chatbots: what can we learn from customer satisfaction surveys?,” in chatbot research and design conversations 2020. lecture notes in computer science 12604, ed. asbjørn følstad et al. (switzerland: springer, 2021), 205, https://doi.org/10.1007/978-3-030-68288-0_14. 11 andrea gasparini and heli kautonen, “understanding artificial intelligence in research libraries: extensive literature review,” liber quarterly: the journal of the association of european research libraries 32, no. 1 (2022): 1–36, https://doi.org/10.53377/lq.10934; asefeh asemi, andrea ko, and mohsen nowkarizi, “intelligent libraries: a review on expert systems, artificial intelligence, and robot,” library hi tech 39, no. 2 (2021): 412–34, https://doi.org/10.1108/lht-02-2020-0038. 12 harry e. pence, “future of artificial intelligence in libraries,” the reference librarian 63, no. 4 (2022): 138, https://doi.org/10.1080/02763877.2022.2140741. 13 elizabeth tait and cameron m. pierson, “artificial intelligence and robots in libraries: opportunities in lis curriculum for preparing the librarians of tomorrow,” journal of the australian library and information association 71, no. 3 (2022): 257, https://doi.org/10.1080/24750158.2022.2081111. 14 vaishali kaushal and rajan yadav, “the role of chatbots in academic libraries: an experiencebased perspective,” journal of the australian library and information association 71, no. 3 (2022): 215–32, https://doi.org/10.1080/24750158.2022.2106403; nishad nawaz and mohamed azahim saldeen, “artificial intelligence chatbots for library reference services,” journal of management information and decision sciences 23 (2020): 442–49, proquest one business. 15 sharesly rodriguez and christina mune, “uncoding library chatbots: deploying a new virtual reference tool at the san jose state university library,” reference services review 50, no. 3/4 (2022): 392–405, https://doi.org/10.1108/rsr-05-2022-0020; joseph vincze, “virtual reference librarians (chatbots),” library hi tech news 34, no. 4 (2017): 5–8, https://doi.org/10.1108/lhtn-03-2017-0016. 16 amanda wheatley and sandy hervieux, “artificial intelligence in academic libraries: an environmental scan,” information services & use 39, no. 4 (2020): 347–56, https://doi.org/10.3233/isu-190065. https://doi.org/10.1108/07378831211213238 https://doi.org/10.4135/9781526421036879118 https://doi.org/10.1201/9780367808495 https://doi.org/10.48550/arxiv.2105.05222 https://doi.org/10.1007/978-3-030-68288-0_14 https://doi.org/10.53377/lq.10934 https://doi.org/10.1108/lht-02-2020-0038 https://doi.org/10.1080/02763877.2022.2140741 https://doi.org/10.1080/24750158.2022.2081111 https://doi.org/10.1080/24750158.2022.2106403 https://doi.org/10.1108/rsr-05-2022-0020 https://doi.org/10.1108/lhtn-03-2017-0016 https://doi.org/10.3233/isu-190065 information technology and libraries december 2023 reference chatbots in canadian academic libraries 8 guy, pival, lewis, and groome 17 yongming wang, “using machine learning and natural language processing to analyze library chat reference transcripts,” information technology and libraries 41, no. 3 (2022): 1–10, https://doi.org/10.6017/ital.v41i3.14967. 18 xiaoming zhai, “chatgpt user experience: implications for education,” ssrn electronic journal (2022): 1–18, https://doi.org/10.2139/ssrn.4312418. 19 andrew cox, “how artificial intelligence might change academic library work: applying the competencies literature and the theory of the professions,” journal of the association for information science and technology 74, no. 3 (2022): 12, https://doi.org/10.1002/asi.24635; mary lee kennedy, “what do artificial intelligence (ai) and ethics of ai mean in the context of research libraries?,” research library issues, no. 299 (2019): 3–13, https://doi.org/10.29242/rli.299.1. 20 “canadian library directory,” government of canada, updated june 2, 2021, https://siglessymbols.bac-lac.gc.ca/eng/search. 21 mctear, conversational ai, 30. 22 alexander l. george and andrew bennett, case studies and theory development in the social sciences (cambridge: mit press, 2005), 81–83, proquest ebook central; bent flyvberg, “five misunderstandings about case-study research,” in qualitative research practice, ed. clive seale, giampietro gobo, david silverman, and jaber gubrium (london: sage publications, 2004), 398–99. 23 robert k. yin, case study research: design and methods, 4th ed. (los angeles, ca: sage publications, 2009), 47–49. 24 ivy.ai, "ivy.ai announces ai chatbot support to streamline name, image and likeness policies," july 15, 2021, https://www.globenewswire.com/en/newsrelease/2021/07/15/2263873/0/en/ivy-ai-announces-ai-chatbot-support-to-streamlinename-image-and-likeness-policies.html 25 kaushal and yadav, “the role of chatbots,” 25. 26 andrew j. nelson and jennifer irwin, “defining what we do—all over again: occupational identity, technological change, and the librarian/internet-search relationship,” academy of management journal 57, no. 3 (2014): 892–928, https://doi.org/10.5465/amj.2012.0201. 27 nelson and irwin, “defining what we do,” 919–21. 28 nelson and irwin, “defining what we do,” 919. 29 nelson and irwin, “defining what we do,” 919. 30 julia guy, “artificial interactions: the ethics of virtual assistants,” (master’s thesis, university of alberta, 2022), 31, education & research archive, https://doi.org/10.7939/r3-s2w1-1z15. https://doi.org/10.6017/ital.v41i3.14967 https://doi.org/10.2139/ssrn.4312418 https://doi.org/10.1002/asi.24635 https://doi.org/10.29242/rli.299.1 https://sigles-symbols.bac-lac.gc.ca/eng/search https://sigles-symbols.bac-lac.gc.ca/eng/search https://www.globenewswire.com/en/news-release/2021/07/15/2263873/0/en/ivy-ai-announces-ai-chatbot-support-to-streamline-name-image-and-likeness-policies.html https://www.globenewswire.com/en/news-release/2021/07/15/2263873/0/en/ivy-ai-announces-ai-chatbot-support-to-streamline-name-image-and-likeness-policies.html https://www.globenewswire.com/en/news-release/2021/07/15/2263873/0/en/ivy-ai-announces-ai-chatbot-support-to-streamline-name-image-and-likeness-policies.html https://doi.org/10.5465/amj.2012.0201 https://doi.org/10.7939/r3-s2w1-1z15 abstract introduction literature review methods findings reference chatbots live chat service integrating or separating live agent chat and chatbot reference services case study conclusion and next steps endnotes autocomplete as a research tool: a study on providing search suggestions david ward, jim hahn, and kirsten feist information technology and libraries | december 2012 6 abstract as the library website and its online searching tools become the primary “branch” many users visit for their research, methods for providing automated, context-sensitive research assistance need to be developed to guide unmediated searching toward the most relevant results. this study examines one such method, the use of autocompletion in search interfaces, by conducting usability tests on its use in typical academic research scenarios. the study reports notable findings on user preference for autocomplete features and suggests best practices for their implementation. introduction autocompletion, a searching feature that offers suggestions for search terms as a user types text in a search box (see figure 1), has become ubiquitous on both larger search engines as well as smaller, individual sites. debuting as the “google suggest” feature in 20041, autocomplete has made inroads into the library realm through inclusion in vendor search interfaces, including the most recent proquest interface and in ebsco products. as this feature expands its presence in the library realm, it is important to understand how patrons include it in their workflow and the implications for library site design as well as for reference, instruction, and other library services. an analysis of search logs from our library federated searching tool reveals both common errors in how search queries are entered, as well as patterns in the use of library search tools. for example, spelling suggestions are offered for more than 29 percent of all searches, and more than half (51 percent) of all searches appear to be for known items.2 additionally, punctuation such as commas and a variety of correct and incorrect uses of boolean operators are prevalent. these patterns suggest that providing some form of guidance in keyword selection at the point of searchterm entry could improve the accuracy of composing searches and subsequently the relevance of search results. this study investigates student use of an autocompletion implementation on the initial search entry box for a library’s primary federated searching feature. through usability studies, the authors analyzed how and when students use autocompletion as part of typical library research, asked the students to assess the value and role of autocompletion in the research process, and noted any drawbacks of implementing the feature. additionally, the study sought to analyze how david ward (dh-ward@illinois.edu) is reference services librarian, jim hahn (jimhahn@illinois.edu) is orientation services and environments librarian, undergraduate library, university of illinois at urbana-champaign. kirsten feist (kmfeist@uh.edu) is library instruction fellow, m.d. anderson library, university of houston. information technology and libraries | december 2012 7 figure 1. autocomplete implementation implementing autocompletion on the front end of a search affected providing search suggestions on the back end (search result pages). literature review autocomplete as a plug-in has become ubiquitous on site searches large and small. research on autocomplete includes a variety of technical terms that refer to systems using this architecture. examples include real time query expansion (rtqe), interactive query expansion, search-asyou-type (sayt), query completion, type-ahead search, auto-suggest, and suggestive searching/search suggestions. the principal research concerns for autocomplete include issues related to both back-end architecture and assessments of user satisfaction and systems for specific implementations. nandi and jagadish present a detailed system architecture model for their implementation of autocomplete, which highlights many of the concerns and desirable features of constructing an index that the autocomplete will query against.3 they note in particular that the quality of suggestions presented to the user must be high to compensate for the user interface distraction of having suggestions appear as a user types. this concern is echoed by hanmin et al. in their analysis of how the results offered by their autocomplete implementation met user expectations.4 their findings emphasize configuring systems to display only keywords that bring about successful searches, noting “precision [of suggested terms] is closely related with satisfaction.” an additional analysis of their implementation also noted that suggesting search facets (or “entity types”) is a way to enhance autocomplete implementations and aid users in selecting suitable keywords for their search.5 wu also suggests using facets to help group suggestions by type, which improves comprehension of a list of possible keyword combinations.6 in defining important design characteristics for autocomplete as a research tool | ward, hahn, and feist 8 autocomplete implementations, wu advocates building in a tolerance for misplaced keywords as a critical component. chaudhuri and kaushik examine possible algorithms to use in building this type of tolerance into search systems. misplaced keywords include typing terms in the wrong field (e.g., an author name in a title field), as well as spelling and word order errors.7 systems that are tolerant in this manner “should enumerate all the possible interpretations and then sort them according to their possibilities,” a specification wu refers to as “interpret-as-you-type.”8 additionally, both wu and nandi and jagadish specify fast response time (or synchronization speed) as a key usability feature in autocomplete interfaces, with nandi and jagadish indicating 100ms as a maximum.9,10 speed also is a concern in mobile applications, which is part of the reason paek et al. recommend autocomplete as part of mobile search interfaces, in which reducing keystrokes is a key usability feature.11 on the usability end, white and marchionini12 assess best practices for implementation of searchterm-suggestion systems and users’ perceptions of the quality of suggestions and search results retrieved. they find that offering keyword suggestions before the first set of results has been displayed generated more use of the suggestions than displaying them as part of a results page, even though the same terms were displayed in both cases. providing suggestions at this initial stage also led to better-quality initial queries, particularly in cases where users may have little knowledge of the topic for which they are searching. the researchers also warn that, while presenting “query expansion terms before searchers have seen any search results has the potential to speed up their searching . . . it can also lead them down incorrect search paths.”13 method usability study we conducted two rounds of usability testing on a version of university of illinois at urbanachampaign’s undergraduate library website that contained a search box for the library’s federated/broadcast search tool with autocomplete built in. the testing followed nielsen’s guidelines, using a minimum of five students for each round, with iterative changes to the interface made between rounds based on feedback from the first group.14 we conducted the initial round in summer 2011 with five library undergraduate student workers. the second round was conducted in september 2011 and included eight current undergraduate students with no affiliation to the library. by design, this method does not allow us to state definitive trends for all autocomplete implementations. it is not a statistically significant method by quantitative standards—rather, it gives us a rich set of qualitative data about the particular implementation (easy search) and specific interface (undergrad library homepage) being studied. the study’s questions were approved by the campus institutional review board (irb), and each participant signed an irb waiver before participating. students for the september round were recruited via advertisements on the website and flyers in the library. gift certificates to a local coffee shop provided the incentive for the study. information technology and libraries | december 2012 9 the procedure for each interview focused on two steps (see appendix). first, each participant was asked to use the search tool to perform a series of common research tasks, including three queries for known item searches (locating a specific book, journal, and movie), and two searches that asked the student to recall and describe a current or previous semester’s subject-based search, then use the search interface to find materials on that topic. participants were asked to follow a speak-aloud protocol, dictating the decision-making process they went through as they conducted their search, including noting why they made each choice that they made along the way. researchers observed and took notes, including transcribing user comments and noting mouse movements, clicks, and other choices made during the searches. because part of the hypothesis of the study was that the autocomplete feature would be used as an aid for spelling search queries correctly, titles with possibly challenging spelling were chosen for the known item searches. participants were not told about or instructed in the use of autocomplete; rather, it was left to each of them to discover it and individually decide whether to use it during each of the searches they conducted as a part of the study. in the second part of the interview, researchers asked students questions about their use (or lack thereof) of the autocomplete feature during the initial set of task-based questions. this set of questions focused on identifying when students felt the autocomplete feature was helpful as part of the search process, why they used it when they did, and why they did not use it in other cases. students also were asked more general questions about ways to improve the implementation of the feature. in the second round of testing (with students from the general campus populace), an additional set of questions was asked to gather student demographic information and to have the participants assess the quality of the choices the autocomplete feature presented them with. these questions were based in part on the work of white and marchionini, who had study participants conduct a similar quality analysis.15 autocomplete implementation the autocomplete feature was javascript and based on the jquery autocomplete plugin (http://code.google.com/p/jquery-autocomplete/). autocomplete plugins generally pull results either from a set of previous searches on a site or from a set of known products and pages within a site. for the study, the initial dataset used was a list of thousands of previous searches using the library’s easy search federated search tool. however, this data proved to be extremely messy and slow to search. in particular, a high number of problematic searches were in the data, including entire citations pasted in, misspelled words, and long natural-language strings. constructing an algorithm to clean up and make sense of these difficult queries would have required too much time and overhead, so we investigated other sources. researchers looked at autocomplete apis for both bing (http://api.bing.com/osjson.aspx?query=test) and google (the suggest toolbar api: http://google.com/complete/search?output=toolbar&q=test). both worked well and produced autocomplete as a research tool | ward, hahn, and feist 10 similar relevant results for the test searches. significantly, the search algorithms behind each of these apis were able to process the search query into far more meaningful and relevant results than what was achieved through the test implementation using local data. these algorithms also included correcting misspelled words entered by users by presenting correctly spelled results from the dropdown list. we ultimately chose the google api on the basis of its xml output. findings the study’s findings were consistent across both rounds of usability testing. notable themes include using autocomplete to correct spelling on known-item searches (specific titles, authors, etc.), to build student confidence with an unfamiliar topic, to speed up the search process, to focus broad searches, and to augment search-term vocabulary. the study also details important student perceptions about autocomplete that can guide the implementation process in both library systems and instructional scenarios. these student perceptions include themes of autocomplete’s popularity, desire for local resource suggestions, various cosmetic page changes, and user perception of the value of autocomplete to their peers. spelling “it definitely helps with spelling,” said one student, responding to a prompt of how they would explain the autocomplete feature to friends. correcting search-term spelling is a key way in which students chose to make use of the autocomplete feature. for known-item searches, all eight students in the second round of testing selected suggestions from auto-complete at least two times out of the three searches conducted. of those eight students, four (50 percent) used auto-complete every time (three out of three opportunities), and four (50 percent) used it 67 percent of the time (two out of three opportunities). we found that of this latter group who only selected auto-complete suggestions two out of the three opportunities presented, three of them did in fact refer to the dropdown selections when typing their inquiries, but did not actively select these suggestions from the dropdown all three times. in choosing to use autocomplete for spelling correction, one student noted that autocomplete was helpful “if you have an idea of a word but not how it’s spelled.” it is interesting to note, with regard to clicking on the correct spellings, that students do not always realize they are choosing a different spelling than what they had started typing. an example is the search for journal of chromatography, which some students started spelling as “journal of chormo,” then picked the correct spelling (starting “chroma”) from the list, without apparently realizing it was different. this is an important theme: if a student does not have an accurate spelling from which to begin, the search might fail, or the student will assume the library does not have any information on the chosen topic. this is particularly true in many current library catalog interfaces, which do not provide spelling suggestions on their search result pages. locating known items information technology and libraries | december 2012 11 another significant use of the autocomplete feature was in cases where students were looking for a specific item but had only a partial citation. in one case, a student used autocomplete to find a specific course text by typing in the general topic (e.g., “africa”) and then an author’s name that the course instructor had recommended. the google implementation did an excellent job of combining these pieces of information into a list of actual book titles from which to choose. this finding also echoes those of white and marchioni, who note that autocomplete “improved the quality of initial queries for both known item and exploratory tasks.”16 the study also found this to be an important finding because overall, students are looking for valid starting points in their research (see “confidence” below), and autocomplete was found to be one way to support finding instructor-approved items in the library. this echoes findings from project information literacy, which shows students typically turn to instructor-sanctioned materials first when beginning research.17 this use case typically arises when an instructor suggests an author or seminal text on a research topic to a student, often with an incomplete or inaccurate title. one participant also mentioned that they wanted the autocomplete feature to suggest primary or respected authors based on the topic they entered. confidence “[autocomplete is] an assurance that it [the research topic] is out there . . . you’re not the first person to look for it.”—student participant there were multiple themes related to the concept of user confidence discovered in the study. first, some participants noted that when they see the suggestions provided by autocomplete it verifies that what they are searching is “real”—validating their research idea and giving them the sense that others have been successful previously searching for their topic. when students were asked the source of the autocomplete suggestions, most thought that results were generated based on previous user searches. their response to this particular question highlighted the notion of “popularity ranking,” in that many were confident that the suggestions presented were a result of popular local queries. in addition, one participant thought that results generated were based on synonyms of the word they typed, while another believed that the results generated were included only if the text typed matched descriptions of materials or topics currently present in the library’s databases. some students did indicate the similarity of search results to google suggestions, but they did not make an exact connection between the two. this assumption that the terms are vetted seems to lend authority to the suggestions themselves and parallels the research of jung et al., who investigated satisfaction based on the connection between user expectations on selecting an autocomplete keyword and results.18 the benefit of autocomplete-provided suggestions in this context was noted even in cases when participants did not explicitly select items from the autocomplete list. students’ confidence in their own knowledge of a topic also factored into when they used autocomplete. participants reported that if they knew a topic well (particularly if the topic chosen was one that they had previously completed a paper on), it was faster to just type it in without autocomplete as a research tool | ward, hahn, and feist 12 choosing a suggestion from the autocomplete suggestion list. one participant also noted that common topics (e.g., “someone’s name and biography”) would also be cases in which they would not use the suggestions. after the first round of usability testing, a question was added to the post–test assessment asking students to rate their confidence as a researcher on a five-point scale. all participants in the second round rated themselves as a four or five out of five. while this confirms findings on student confidence from studies like project information literacy, this assessment question ultimately had no correlation to actual use of autocomplete suggestions during the subject-based research phase of the study. rather, confidence in the topic itself seemed to be the defining factor in use. speed the study also showed that speed is a factor in deciding when to use autocomplete functionality. specifically, autocomplete should be implemented in a way in which they are not perceived as slowing down the search process. this includes having results displayed in a way that is easily ignored if students want to type in an entire search phrase themselves, and having the presentation and selection of search suggestions done in a way that is easy to read and quick to be selected. autocomplete is perceived as a time-saver when clicking on an item will shorten the amount of typing students need to do. however, some students will ignore autocomplete altogether; they do this when they know what they want, and they feel that speed is compromised if they need to stop and look at the suggestions when they already know what they want to search. in the study, different participants would often cite speed as a reason for both selecting and not selecting an item for the same question, particularly with the known-item searches. this finding indicates that a successful implementation should include both a speedy response (as noted above in nandi and jagadish’s research on delivering suggestions within 100ms, paek et al.’s research on reducing keystrokes, and white and marchioni’s finding that providing suggested words was “a real time-saver”),19 as well as an interface which does not force users to select an item to proceed, or obscure the typing of a search query. focusing topics “it helps to complete a thought.” “[autocomplete is] extra brainstorming, but from the computer.”— participant responses the above quotes indicate the use of autocomplete as a tool for query formulation and search term identification, a function closely related to the association of college and research libraries (acrl) information literacy standard two, which includes competencies for selecting appropriate search keywords and controlled vocabulary related to a topic.20 this quote also parallels a similar finding from white and marchioni, 21 who had a user comment that autocomplete “offered words (paths) to go down that i might not have thought of on my own.” the use of autocomplete for scoping and refining a topic also parallels elements of the reference interview, specifically the open and closed questions typically asked to help a student define what information technology and libraries | december 2012 13 aspects of a topic they are interested in researching. this finding has many exciting implications for how elements and best practices from both classroom instruction and reference methodologies can be injected directly into search interfaces, to aid students who may not consult with a librarian directly during the course of their research. autocomplete was used at a lower rate, and in different ways, for subject searching compared to kown-item searching. three out of eight participants (38 percent) from the second round of testing did not use autocomplete at all for subject-based searching (zero of two opportunities). five out of eight participants (62 percent) used autocomplete on one of two search opportunities (50 percent). no participants used autocomplete on both of the search opportunities. the stage of research a student was in helped to indicate where and how autocomplete could be useful in topic formulation and search-term selection for subject searches. participants indicated that they would use autocomplete for narrowing ideas if they were at a later stage in a paper, when they knew more about what they wanted or needed specifics on their topic. however, early in a paper, some participants indicated they just wanted broad information and did not want to narrow possible results too early. this finding also supports previous research from project information literacy, which describes student desire to learn the “big-picture context” as a key function in the early part of the research process.22 at this topic-focusing stage, some participants told us that the search suggestions reminded them of topics that were discussed in class. further, the study showed that autocomplete suggests aspects of topics to student that they had not previously considered, and one participant indicated that she might change her topic if she saw something interesting from the list of suggestions, particularly something she had not thought of yet. interface implementation though students who opted to utilize the autocomplete feature were generally satisfied with the results generated, some students recommended increasing the number of autocomplete suggestions in the dropdown menu to increase the probability of finding their desired topic or known item or to potentially lead to other related topics to narrow their search. in addition, students recommended increasing the width of the autocomplete text box, as its present proportions are insufficient for displaying longer suggestions without text wrapping. some students also noted that increasing the height of the dropdown menu containing the autocomplete suggestions might help reduce the necessity to scroll through the results and may help to draw user attention to all results for those who elect not to use the scroll bar. beyond the suggested improvements for the functionality of the autocomplete feature, students also noted a few cosmetic changes they would like to see implemented. in particular, students would prefer to see larger text and a better use of fonts and font colors when using autocomplete. one student noted that if different fonts and colors were used in this feature, the results generated might stand out more and better attract users, or better draw users’ attention to the recommended search terms. autocomplete as a research tool | ward, hahn, and feist 14 perceived value to peers most students who participated in the study stated that they would recommend that their fellow classmates utilize the autocomplete feature for two primary purposes: known-item searches and locating alternative options for research topics. one student noted that she would recommend using this feature to search keywords “easily and efficiently,” while another student indicated that the feature helps to link to other related keywords. this finding also revealed that users were not intimidated by the feature and did not see it as a distraction from the search process, an initial researcher concern. conclusion and future directions implementation implications implementing autocomplete functionality that accounts for the observed research tendencies and preferences of users makes for a compelling search experience. participant selection of autocomplete suggestions varied between the types of searches studied. spelling correction was the one universally acknowledged use. for subject-based searching, confidence in the topic searched and the stage of research emerged as indicators of the likelihood of autocomplete suggestions being taken. the use and effectiveness of providing subject suggestions requires further study, however. students expect suggestions to produce usable results within a library’s collections, so the source of the suggestions should incorporate known, viable subject taxonomies to maximize benefits and not lead students down false search paths. there is an ongoing need to investigate possible search-term dictionaries outside of google, such as lists of library holdings, journal titles, article titles, and controlled vocabulary from key library databases. the “brainstorming” aspect of autocomplete for subject searching is an intriguing benefit that should be more fully explored and supported. in combination with these findings, participant’s positive responses to some of the assessment questions (including first impressions of autocomplete and willingness to recommend it to friends) indicate that autocomplete is a viable tool to incorporate site-wide into library search interfaces. instruction implications traditional academic library instruction tends to focus on thinking of all possible search terms, synonyms, and alternative phrasing before the onset of actual searching and engagement with research interfaces. this process is later refined in the classroom by examining controlled vocabulary within a set of search results. however, observations from this study (as well as researcher experience with users at the reference desk) indicate that students in real-world situations often skip this step and rely on a more trial-and-error method for choosing search terms, beginning with one concept or phrasing rather than creating a list of options that they try sequentially. the implication for classroom practice is that instruction on search-term formulation should include a review of autocomplete suggestions as well as practical methods for integrating these suggestions into the research process. this is particularly important as vendor databases information technology and libraries | december 2012 15 move toward making autocomplete a default feature. proper instruction in its use can help advance acrl information literacy goals and provide a practical, context-sensitive way to explain how a varied vocabulary is important for achieving relevant results in a research setting.23 reference implications as with classroom instruction, traditional reference practice emphasizes a prescriptive path for research that involves analyzing which aspects of a topic or alternate vocabulary will be most relevant to a search before search-term entry. open and closed questioning techniques encourage users to think about different facets of their topic, such as time period, location, and type of information (e.g., statistics) that might be relevant. an enhanced implementation of autocomplete can incorporate these best practices from the reference interview into the list of suggestions to aid unmediated searching. one way this might be incorporated is through presenting faceted results that change on the basis of user selection of the type and nature of information they are looking for, such as a time period, format, or subject. for broadcast and federated searching interfaces, this could extend into the results users are then presented with, specifically attempting to use items or databases on the basis of suggestions made during the search entry phase, rather than presenting users with a multitude of options for users to make sense of, some of which may be irrelevant to the actual information need. finally, the findings on use of autocomplete also have implications for search-results pages. many of the common uses (e.g., spelling suggestions and additional search-term suggestion) also should be standard on results pages. this, too, is a common feature of commercial interfaces. bing, for example, includes a related searches feature (on the left of a standard results page), that suggests context-specific search terms based on the query. this feature is also part of their api (http://www.bing.com/developers/s/apibasics.html). providing these reference-without-alibrarian features is essential both in establishing user confidence in library research tools and in developing research skills and an understanding of the information literacy concepts necessary to becoming better researchers. our autocomplete use findings draw attention to user needs and library support across search processes; specifically, autocomplete functionality offers support while forming search queries and can improve the results of user searching. for this reason, we recommend that autocomplete functionality be investigated for implementation across all library interfaces and websites to provide unified support for user searches. the benefits that can be realized from autocomplete can be maximized by consulting with reference and instruction personnel on the benefits noted above and collaboratively devising best practices for integrating autocomplete results into searchstrategy formulation and classroom-teaching workflows. http://www.bing.com/developers/s/apibasics.html autocomplete as a research tool | ward, hahn, and feist 16 references 1. “autocomplete—web search help,” google, support.google.com/websearch/bin/answer.py?hl=en&answer=106230 (accessed february 7, 2012). 2. william mischo, internal use study, unpublished, 2011. 3. arnab nandi and h. v. jagadish, “assisted querying using instant-response interfaces,” in proceedings of the 2007 acm sigmod international conference on management of data (new york: acm, 2007), 1156–58, doi: 10.1145/1247480.1247640. 4. hanmin jung et al., “comparative evaluation of reliabilities on semantic search functions: auto-complete and entity-centric unified search,” in proceedings of the 5th international conference on active media technology (berlin, heidelberg: springer-verlag, 2009), 104–13, doi: 10.1007/978-3-642-04875-3_15. 5. hanmin jung et al., “auto-complete for improving reliability on semantic web service framework,” in proceedings of the symposium on human interface 2009 on human interface and the management of information. information and interaction. part ii: held as part of hci international 2009 (berlin, heidelberg: springer-verlag, 2009), 36–44, doi: 10.1007/978-3-64202559-4_5. 6. hao wu,“search-as-you-type in forms: leveraging the usability and the functionality of search paradigm in relational databases,” vldb 2010, 36th international conference on very large data bases, september 13–17, 2010, singapore, p. 36–41, www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20wor kshop/content/p7.pdf (accessed february 7, 2012). 7. surajit chaudhuri and raghav kaushik, “extending autocompletion to tolerate errors,” in proceedings of the 35th sigmod international conference on management of data (new york,: acm, 2009), 707–18, doi: 10.1145/1559845.1559919,. 8. wu, “search-as-you_type in forms,” 38. 9. wu, “search-as-you-type in forms.” 10. ibid. 11. tim paek, bongshin lee, and bo thiesson, “designing phrase builder: a mobile real-time query expansion interface,” in proceedings of the 11th international conference on humancomputer interaction with mobile devices and services (new york: acm, 2009), 7:1–7:10, doi: 10.1145/1613858.1613868. http://support.google.com/websearch/bin/answer.py?hl=en&answer=106230 http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20workshop/content/p7.pdf http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/phd_workshop_2010/phd%20workshop/content/p7.pdf information technology and libraries | december 2012 17 12. ryen w. white and gary marchionini, “examining the effectiveness of real-time query expansion,” information processing and management 43, no. 3 (2007): 685–704, doi: 10.1016/j.ipm.2006.06.005. 13. white and marchionini, “examining the effectiveness of real-time query expansion,” 701. 14. jakob nielsen, “why you only need to test with 5 users,” jakob nielsen’s alertbox (blog), march 19, 2000, www.useit.com/alertbox/20000319.html (accessed february 7, 2012). see also walter apai, “interview with web usability guru, jakob nielsen,” webdesigner depot (blog), september 28, 2009, www.webdesignerdepot.com/2009/09/interview-with-web-usability-gurujakob-nielsen/ (accessed february 7, 2012). 15. white and marchionini, “examining the effectiveness of real-time query expansion.” 16. ibid. 17. alison j. head and michael b. eisenberg, “lessons learned: how college students seek information in the digital age,” project information literacy progress report, december 1, 2009, projectinfolit.org/pdfs/pil_fall2009_finalv_yr1_12_2009v2.pdf (accessed february 7, 2012). 18. jung et al., “comparative evaluation of reliabilities on semantic search functions.” 19. jung et al., “comparative evaluation of reliabilities on semantic search functions”; paek, lee, and thiesson, “designing phrase builder”; white and marchionini, “examining the effectiveness of real-time query expansion.” 20. association of college and research libraries (acrl), “information literacy competency standards for higher education,” http://www.ala.org/acrl/standards/informationliteracycompetency (accessed february 7, 2012). 21. white and marchionini, “examining the effectiveness of real-time query expansion.” 22. head and eisenberg, “lessons learned.” 23. association of college and research libraries (acrl), “information literacy competency standards for higher education.” http://www.useit.com/alertbox/20000319.html http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://projectinfolit.org/pdfs/pil_fall2009_finalv_yr1_12_2009v2.pdf http://www.ala.org/acrl/standards/informationliteracycompetency autocomplete as a research tool | ward, hahn, and feist 18 appendix. questions task-based questions 1. does the library have a copy of “the epic of gilgamesh?” 2. does the library own the movie “battleship potempkin?” 3. does the library own the journal/article “journal of chromatography?” 4. for this part, we would like you to imagine you are doing research for a recent paper, either one you have already completed or one you are currently working on. a. what is this paper about? (what is your research question?) b. what class is it for? c. search for an article on yyy 5. same as 4, but different class/topic, and search for a book on yyy autocomplete-specific questions 1. what is your first impression of the autocomplete feature? 2. have you seen this feature before? a. if so where have you used it? 3. why did you/did you not use the suggested words? (words in the dropdown) 4. where do you think the suggestions are coming from? or, how are they being chosen? 5. when would you use this? 6. when would you not use it? 7. how can it be improved? 8. overall, what do you like/not like about this option? 9. would you suggest this feature to a friend? 10. if you were to explain this feature to a friend how might you explain it to them? assessment and demographic questions autocomplete feature 1. [known item] rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—poor quality/not appropriate 2—low quality 3—acceptable 4—good quality –5—high quality/very appropriate information technology and libraries | december 2012 19 2. [subject/topic search] rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—poor quality/not appropriate 2—low quality –3—acceptable 4—good quality –5—high quality/very appropriate 3. please indicate how strongly you agree or disagree with the following statement: “the autocomplete feature is useful for narrowing down a research topic.” (5 point scale): 1—strongly disagree 2—disagree –3—undecided –4—agree –5—strongly agree demographics 1. please indicate your current class status a. ⁭ freshman b. ⁭ sophomore c. ⁭ junior d. ⁭ senior 2. what is your declared or anticipated major? 3. have you had a librarian come talk to one of your classes or give an instruction session in one of your classes? if yes, which class(es)? 4. please rate your overall confidence level when beginning research for classes that require library resources for a paper or assignment. (5 point scale): 1—no confidence 2—low confidence 3—reasonable confidence 4—high confidence –5—very high confidence 5. what factors influence your confidence level when beginning research for classes that require library resources for a paper or assignment? utilizing technology to support and extend access to students and job seekers during the pandemic public libraries leading the way utilizing technology to support and extend access to students and job seekers during the pandemic daniel berra information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13261 daniel berra (danielb@pfulgervilletx.gov) is assistant director, pflugerville (texas) public library. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. the ongoing pandemic necessitated a reimaging of public library services and resources. out of this challenge rose opportunities to better serve the needs of our communities during the pandemic and beyond. when our library first closed our doors to the public last march, we began discussions on how the needs of our community have changed. we identified two key groups for whom the pandemic had forced an uncomfortable shift: students suddenly thrust into virtual learning and adults who had lost their jobs. while we continue to serve all members of our community in a variety of ways, we looked to increase support for these specific groups utilizing available technology. like many public libraries, the pflugerville public library quickly shifted our service model to include virtual programs, curbside pickup, library cards issued remotely and a focus on electronic resources. our community is rapidly growing and diverse. many of our nearly 70,000 residents are frequent users of library services, attend our wide array of programs, hold meetings, study or work inside the building and enjoy both the physical and virtual library collection. the pandemic shift required our talented staff to find ways to provide a similar level of service to a community who heavily utilizes the library. for both students and job seekers, we took steps to alleviate some of the difficulties the building’s closure caused by utilizing existing technology. we worked with the city’s it department to extend the library’s wi-fi to cover the entire parking lot, allowing for 24-hour access. we also utilized our existing print from your own device system to allow library users to submit print jobs and then pick them up through our curbside service. we added additional wi-fi hotspots available for checkout to ensure access at home for those lacking internet. since these services were already offered to some degree, the expansion of access was relatively easy to implement. for students we drew upon our existing relationship with the pflugerville independent school district (pfisd) to provide support and extend access. we expanded the offering of our special digit cards, which allow students to sign up for an account giving them access to all of our electronic resources and wi-fi hotspots. the school district’s librarians handle the signups and then submit the forms so we can set up the accounts then contact students by email or phone. we further extended access to ebooks by working with the district and our vendor overdrive, to provide a direct way for students to browse and check out through the district’s own ebook app. this allows students to seamlessly see both of our collections, significantly increasing their reading options and removing barriers to access. on the support front, we utilized a portion of the city’s cares act funds directed toward the library to launch a live, virtual tutoring service called brainfuse helpnow. students of all ages have anonymous access to tutors from home seven days a week, as well as additional homework mailto:danielb@pfulgervilletx.gov information technology and libraries march 2021 utilizing technology to support and extend access | berra 2 support resources. this piece meshes nicely with some of our virtual programming for teens, like our sat and act practice tests and other testand career-preparation e-resources. recognizing the pandemic’s impact on the economy, and how this directly affects our community, we worked to prioritize support for the unemployed and under-employed. we added a resume review/job-search coaching service led by two of our circulation staff members. we utilized another portion of our cares act funds to offer career online high school, providing adults with access to an online program to obtain their high school diploma. we also began lending laptops for home use to ensure access to necessary technology. some of our support was already in place before the pandemic began, and we made a significant marketing push to highlight these e-resources. for instance, we partner with the pflugerville community development corporation to provide the online training resource lynda.com (soon to be linkedin learning). we saw a large increase in usage particularly in the first few months of the pandemic as community members looked to add employable skills to their toolboxes. we also created a page on our website with all of our job search assistance resources and services highlighted in one place. while the main emphasis of these efforts utilizes technology, serving the needs of the entire community also requires supporting those who are generally less connected. we have to balance our digital expectations with something more tangible, recognizing many library users still utilize the library in a more traditional way. for students, our senior youth services librarian partnered with pfisd for a book give away in conjunction with the district’s food distribution program to get books in the hands of children for the summer. we also began distributing “care kits” through our curbside service that include personal grooming products and cold weather gear for anyone in need. while 2020 featured the addition of many new services or significant expansion of existing ones, we are focused in 2021 on increasing our marketing efforts for these offerings. relying too heavily on digital forms of communication can limit the impact of our services. for instance, if we want to let people who do not have access to the internet at home know we have wi-fi hotspots and laptops available for checkout, then spreading the word through our standard methods of social media, website, and email will prove ineffective. with the building currently closed to the public, we face an additional barrier to communication. to help alleviate some of this, we have created a job search assistance flyer that we are distributing at places like local food pantries. we plan to expand on similar methods of marketing throughout the year. while positive feedback is often hidden from libraries since we prioritize patron privacy and anonymity, we have received a few specific stories that highlight our impact. our firs t scholarship recipient for career online high school shared how the opportunity to obtain her high school diploma will open up new professional avenues and erase the stigma of having not completed high school. another community member who took advantage of our job search coaching to prepare for an interview expressed gratitude to the library staff who helped increase his employment chances. we also see resumes and homework assignments printed through our virtual printing service, hear from parents with children utilizing hotspots for virtual schooling, see cars in the parking lot using the extended wi-fi and track statistics showing a large increase in the usage of our electronic resources. https://library.pflugervilletx.gov/services/assistance-for-job-seekers information technology and libraries march 2021 utilizing technology to support and extend access | berra 3 the ongoing pandemic necessitated a re-imagining of library services. the needs of our community changed and we set out to find ways to provide assistance to those who need it the most utilizing technology, while remaining mindful of those who are not as comfortable in the digital age. the combination of utilizing technology to address the current needs and expanding access to this technology, has allowed us to better serve the community. we are in the process now of evaluating all of our changes to determine which ones will continue even after the pandemic ends. we already know that we will keep our methods of extending access like the expanded wi-fi availability, laptops for checkout, digit cards for students and the seamless connection to our ebook collection for pfisd. in the area of support, we will continue to offer career online high school, brainfuse helpnow for virtual tutoring, and our resume review/job search coaching service. public libraries are well positioned to innovate and adjust to changes in society. it is one of the things we do extremely well, out of necessity, but also out of a deep desire to serve our communities. all of the shifts the pflugerville public library made related to supporting students and job seekers drew upon existing technology and available resources. what changed was the areas on which we chose to focus our efforts. by prioritizing support and access while pinpointing the needs of the moment, we found ways to better serve our community within the context of everything else we provide. while the jury is still out on how successful some of these initiatives will prove, we already know that many of these changes will continue long after the pandemic ends. researchgate metrics’ behavior and its correlation with rg score and scopus indicators: a combination of bibliometric and altmetric analysis of scholars in medical sciences article researchgate metrics’ behavior and its correlation with rg score and scopus indicators a combination of bibliometric and altmetric analysis of scholars in medical sciences saeideh valizadeh-haghi, hamed nasibi-sis,* maryam shekofteh, and shahabedin rahmatizadeh information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14033 saeideh valizadeh-haghi (saeideh.valizadeh@gmail.com) is assistant professor, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran. *corresponding author hamed nasibi-sis (nasibi.lib@gmail.com) is msc. graduate, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences. maryam shekofteh (shekofteh_m@yahoo.com) is associate professor, department of medical library and information sciences, school of allied medical sciences, shahid beheshti university of medical sciences. shahabedin rahmatizadeh (shahab.rahmatizadeh@gmail.com) is assistant professor, department of health information technology and management, school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran. © 2022. abstract objective: social networking sites are appropriate tools for sharing and exposing scientific works to increase citations. the objectives of the present study are to investigate the activity of iranian scholars in the medical sciences in researchgate and to explore the effect of each of the four researchgate metrics on the rg score. moreover, the citation metrics of the faculty members in scopus and the relationship between these metrics and the rg score were explored. methods: the study population included all sbmu faculty members who have profiles in researchgate (n=950). the data were collected through researchgate and scopus in january 2021. the spearman correlation coefficient was applied to examine the relationship between researchgate metrics and scopus indicators as well as to determine the effect of each researchgate metric on the rg score. results: the findings revealed that the publication sharing metric had the highest correlation (0.918) with the rg score and had the greatest impact on it (p-value <0.001), while the question asking metric showed the lowest correlation (0.11). moreover, there was a significant relationship between the rg score and scopus citation metrics (p-value <0.05). furthermore, all four rg metrics had a positive and significant relationship with scopus indicators (p-value <0.05), in which the number of shared publications had the highest correlation compared to other rg metrics. conclusion: researchers’ participation in the researchgate social network is effective in increasing citation indicators. therefore, more activity in the researchgate social network may have favorable results in improving universities’ ranking. introduction conducting any scientific activity first requires gaining knowledge of previous relevant research and citing those sources. there is often a content link between these activities and the sources cited.1 typically, receiving citations is essential and valuable for researchers because, on the one hand, this issue is effective in the career advancement and promotion of researchers and, on the mailto:saeideh.valizadeh@gmail.com mailto:nasibi.lib@gmail.com mailto:shekofteh_m@yahoo.com mailto:shahab.rahmatizadeh@gmail.com information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 2 other hand, researchers intend to have a greater impact on science by receiving more citations. these works should be shared with other researchers and exposed to their observation to increase scholars’ research activities’ citation rate using appropriate tools.2 in this respect, academic social network sites are appropriate tools for sharing and exposing scientific works to increase citation rates.3 academic social network sites have brought researchers together regardless of time and space constraints and have facilitated scientific communication and information exchange.4 in addition, researchers can use these networks to pursue their common interests with other users.5 various studies indicate that sharing publications and subsequently publications’ visibility through social networks increases the citation rate of these works by more than 50%. it has also been observed that journal articles which are shared through these networks have received more citations than other articles in the same journals.6 one academic social network site for the exchange of scientific information is researchgate, which authors can use to cooperate with researchers in all scientific disciplines.7 through this network, researchers‘ scientific works will have better visibility by other people.8 to use this network, users must initially create their profile and then perform scientific activities.9 the researchers’ activity level in this network is indicated by the rg score, which is determined based on four individual metrics, including the number of shared publications, the researcher’s activity in asking questions, the researcher’s activity in answering other people’s questions, and the number of followers. the individual rg metrics affect researchers’ rg score, but the extent of individual metrics impact on this score is not clear.10 shahid beheshti university of medical sciences (sbmu) is one of the top universities in iran. according to the evaluation of medical universities’ research activities in the webometrics ranking of world universities, sbmu has achieved the fourth rank among iran’s medical universities.11 in the centre for science and technology studies (cwts) leiden ranking, this university is ranked 11th among iranian universities and 646th among world universities.12 faculty members are one of the main components in universities’ educational structure and play a crucial role in generating, conducting, and disseminating knowledge. due to the importance of citations of faculty members’ scientific works in ranking systems and the situation of sbmu in world rankings, it seems that measures should be taken to improve the citations of faculty members of this university as one of the ways to improve the ranking of the university. considering that more than half of the published articles never receive citations, as well as the positive role of research sharing on social networks in increasing the number of citations, it seems that the activities of sbmu faculty members in the researchgate network may increase the citations count to their research and, consequently, improve the university’s ranking.13 however, to date, no research has been carried out on the activity of the faculty members of sbmu in researchgate. literature review various studies have addressed researchers’ activity in the researchgate academic network. the level of researchers’ activities in researchgate and the relationship between citation metrics and rg score are among the topics that have been investigated. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 3 regarding researchers’ activity in researchgate, numerous studies have been carried out. among these, the research of nikkar, rahmani, lui, sheeja, muhammad yousuf ali, mahajan, and joshi can be mentioned.14 nikkar et al. conducted a study to investigate surgical researchers’ activities in researchgate, which revealed that the majority of these researchers (86.24%) are active members in researchgate.15 rahmani et al. identified the activity of faculty members of technical colleges in researchgate, which showed that most of these researchers (64.16%) were active members of this network.16 the study by sheeja et al. of naval architecture engineering researchers at researchgate revealed that most of them (65%) have a researchgate profile.17 the study by muhammad yousuf ali, titled “altmetrics of pakistani library and information science researchers at researchgate,” indicated that 75.73% of researchers have a researchgate profile.18 in contrast, in studies conducted by mahajan et al., joshi et al., and lui et al., findings showed that less than half of the surveyed researchers are active users of this network.19 in addition to measuring activity in researchgate, some other studies also examined the relationship between the rg score and citation indicators. in this regard, in a study by joshi et al., it was revealed that there is a significant relationship between the rg score and citation metrics.20 shrivastava et al. also conducted an analysis of researchgate profiles of panjab university lecturers.21 the results demonstrated that there is a significant relationship between rg and citation metrics. a study conducted by naderbeigi et al. showed that there is a significant relationship between activity on the researchgate network, rg score, and scopus metrics of the faculty members of sharif university of technology.22 the allied medical science scientists’ activity in researchgate was examined by valizadeh-haghi et al.23 the study revealed that there is a significant relationship between rg scores and scopus indicators. correspondingly, the findings showed that there is a significant relationship between lecturers’ academic ranking and their rg scores as well as scopus indicators. according to the literature, it seems that the effect of each of the individual metrics of researchgate on the rg score has not so far been studied. researchgate also has not officially specified the impact of any of its individual metrics on the rg score, while researchers’ awareness of this impact may affect their activity behavior in any of the individual metrics to increase their rg score. previous studies also show that limited research has been conducted in iran regarding faculty members’ activity in researchgate. accordingly, none of these studies has investigated the activity of all faculties of a university. therefore, the objectives of the present study include (1) investigating the activity of sbmu faculty members in researchgate, (2) investigating the effect of each of the four individual researchgate metrics on the rg score of the faculty members, (3) determining the citation metrics of the faculty members in scopus, and (4) the relationship between individual rg metrics and the faculty members’ scopus citation metrics. material and methods the present altmetrics study population included all sbmu faculty members who have profiles in researchgate (n=950). information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 4 to extract the number as well as the name of the faculty members, we used the iranian scientometrics information database, which is developed by the deputy of research and technology of ministry of health and medical education of iran.24 the number of faculty members in this system was 1,430, of which 950 had profiles in researchgate and were examined. the data regarding rg score were collected through direct observation of their profiles in researchgate. the rg score includes four metrics: number of shared publications, researcher’s activity in asking questions, researcher’s activity in answering other people’s questions, and followers. data related to the number of citations and the h-index of each of the lecturers were collected by viewing their profiles in the scopus database in january 2021. in this study, it was assumed that there is a significant relationship between researchgate individual metrics and scopus citation metrics. given that the data were not normally distributed, to examine this relationship, the spearman correlation coefficient was used. moreover, regarding that the impact of each of the researchgate metrics on the rg score has not been officially determined, the spearman correlation coefficient was applied to determine the effect of the individual metrics of researchgate on the rg score of the participants. the collected data were analyzed using excel and spss version 18 software. results the rg score of the faculty members is shown in table 1. all faculty members had an rg score, and most of the faculty members (79.5%) had an rg score of less than one. the average rg score of participants was 15.88. table 1. rg score of the sbmu faculty members rg score frequency mean min max sd median members % <1 55 5.79 0.01 0 0.59 0.08 0 1-11 300 31.58 6.18 1.13 10.98 2.89 6.56 11-21 297 31.26 16.09 11 20.98 2.83 15.92 21-31 209 22 25.24 21.06 30.97 2.79 25.25 31-41 82 8.63 34.9 31.02 40.32 2.56 34.5 41-51 6 0.63 42.88 41.46 45.84 1.57 42.34 41-61 1 0.11 56.49 56.49 56.49 56.49 total 950 100 15.88 0 56.49 10.42 15.05 the findings show that most of the faculty members have shared their publications in researchgate, but only 4.11% of them are not active in sharing their publications (see table 2). information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 5 table 2. number of shared publications of the sbmu faculty members publication frequency mean min max sd median members % 0 39 4.11 0 0 0 0 0 1-50 689 72.53 19.70 1 50 13.12 17 51-100 145 15.26 69.83 51 100 13.24 68 101-150 36 3.79 121.28 101 147 13.50 118 151-200 23 2.42 174.04 153 199 14.66 172 201-250 10 1.05 222.80 203 243 14.29 226 251-300 5 0.53 262.40 251 275 11.13 264 401-450 1 0.11 402 402 402 402 501-550 1 0.11 522 522 522 -_ 522 801-850 1 0.11 824 824 824 824 total 950 100 39.32 0 824 54.73 23 the findings indicate that most faculty members (94.42%) did not have any activity in asking questions. the highest level of activity in this metric was performed by 0.11% of faculty members (see table 3). table 3. the faculty members’ activity in asking questions question frequency mean min max sd median members % 0 897 94.42 0 0 0 0 0 1-10 51 5.37 1.73 1 9 1.58 1 20-30 1 0.11 28 28 28 28 41-50 1 0.11 46 46 46 46 total 950 100 0.17 0 46 1.82 0 table 4. faculty members’ activity in answering questions answers frequency mean min max sd median members % 0 840 88.42 0 0 0 0 0 1-5 91 9.58 2.08 1 5 1.34 2 6-10 10 1.05 7 6 9 1.15 7 11-15 3 0.32 13.33 12 15 1.53 13 16-20 2 0.21 16 16 16 0 16 21-25 1 0.11 25 25 25 25 31-35 1 0.11 31 31 31 31 41-45 1 0.11 41 41 41 41 216-220 1 0.11 218 218 218 218 total 950 100 0.68 0 218 7.43 0 information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 6 additionally, in answer to other researchers’ questions, most faculty members (88.42%) did not have any activity. the highest level of activity in this metric was done by 0.11% of them (see table 4). the findings demonstrated that most faculty members had followers, and only 0.74% had no followers (see table 5). table 5. the number of followers of sbmu faculty members followers frequency mean min max sd median members % 0 7 0.74 0 0 0 0 0 1-50 654 68.84 21.51 1 50 13.58 20 51-100 146 15.37 69.36 51 99 13.76 65.5 101-150 66 6.95 121 102 150 14.53 119 151-200 39 4.11 171.92 151 198 14.38 169 201-250 20 2.11 223.60 202 246 13.29 223.5 >250 18 1.89 391.44 253 891 181.72 338 total 950 100 53.05 0 891 72.17 31 the correlation between rg metrics and rg score was examined using the spearman correlation test. the findings showed that the publication sharing metrics had the highest correlation (0.918) with the rg score; therefore, it had the greatest impact on the rg score (p-value <0.001). the question asking metric had the lowest correlation (0.11) with the rg score (see table 6). table 6. the correlation between researchgate metrics and rg score of faculty members publication followers question answers rg score correlation coefficient 0.918 0.774 0.11 0.185 p-value < .001 < .001 .001 < .001 the number of citations of the faculty members in the scopus database is presented in table 7. the findings showed that most faculty members had citations, and only 5.16% of them had not received any citations. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 7 table 7. number of citations of sbmu faculty members in scopus citation frequency mean min max sd median members % 0 49 5.16 0 0 0 0 0 1-500 771 8116 108.91 1 495 114.80 67 501-1000 68 7.16 694.06 503 997 148.87 690 1001-1500 29 3.05 1213.66 1016 1481 131.55 1180 1501-2000 14 1.47 1771.14 1594 1995 131.41 1741 2001-2500 9 0.95 2175.56 2029 2446 137.10 2192 2501-3000 2 0.21 2747 2686 2808 86.27 2747 3001-3500 2 0.21 3207.5 3007 3408 283.55 3207.5 3501-4000 2 0.21 3767 3582 3952 261.63 3767 4001-4500 1 0.11 4459 4459 4459 4459 4501-5000 1 0.11 4581 4581 4581 4581 6501-7000 1 0.11 6907 6907 6907 6907 1900119500 1 0.11 19272 19272 19272 19272 total 950 100 279.37 0 19272 817.14 83 the findings indicated that most faculty members had an h-index in scopus, and the mean of their h-index was 6.46 (see table 8). table 8. h-index of sbmu faculty members hindex frequency mean min max sd median members % 0 49 5.16 0 0 0 0 0 1-10 732 77.05 4.54 1 10 2.57 4 11-20 137 14.42 14.38 11 20 2.81 14 21-30 29 3.05 24.41 21 30 2.71 24 31-40 2 0.21 35.50 31 40 6.36 35.5 61-70 1 0.11 63 63 63 63 total 950 100 6.46 0 63 5.96 5 the correlation between researchgate indicators and scopus citation metrics is presented in table 9. information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 8 table 9. correlation between researchgate indicators and scopus citation metrics h-index citation p-value correlation coefficient p-value correlation coefficient rg score < .001 0.803 < .001 0.791 publication < .001 0.735 < .001 0.715 question .006 0.09 .019 0.076 answers < .001 0.147 < .001 0.119 followers < .001 0.694 < .001 0.676 the findings showed a positive and significant relationship between the rg score and scopus citation metrics (p-value <0.05). additionally, all four rg metrics had a positive and significant relationship with scopus citation metrics, including citations and the h-index (p-value <0.05). the findings showed that the number of shared publications had the highest correlation with citations (0.715) and h-index (0.735) compared to other rg metrics. discussion researchgate’s mission is to link the academic world and make research accessible to all scholars. this study has compared the rg metrics of sbmu faculty members. the major findings are highlighted and discussed around the four research questions of this study. the findings of the present study revealed that even though more than half of the faculty members have profiles in researchgate and are active in this network, compared to the findings of other studies, such as those of yousuf ali, janmohammadi, rahmani and nikkar, this rate is low.25 this issue may be due to the lack of knowledge and familiarity of faculty members with the researchgate social network or the lack of the need to publish outputs on the researchgate social network, which needs further investigation. the present study results also showed that the mean rg score of sbmu faculty members is similar to the results of other studies conducted in iran and other international studies.26 the current study results indicated that the subjects’ activity in the four rg metrics was slight in some indicators, and the highest activity was related to publications metric. the lowest level of members’ activity was related to the asking-questions metric. considering that the rg score results from the scores obtained by the researcher in the four rg metrics, this study’s research results confirm that the faculty members did not pay enough attention to the activity in all the rg metrics. the present findings showed that faculty members have limited activities in sharing publications, which is aligned with the results from other studies.27 this could be due to several reasons. firstly, young faculty members who have recently joined the university as faculty members may have fewer publications in comparison with older members. another reason may be that faculty members who have recently joined the researchgate social network have not had enough time to share all of their publications. it should be noted that sharing publications on researchgate has massive ramifications for the open access movement. it might be that one of the reasons authors do not publish on rg is because they do not have the rights to do so. in this regard, it is worthy to mention that the publication-sharing metric includes both full-text sharing and/or abstract information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 9 sharing in which sharing the abstract is legal. so, researchers were free to share their abstracts but they have not done it. the findings also show that most of the sbmu faculty members have no activity in two metrics: asking questions and answering questions. compared to other studies, the activity of sbmu faculty members in these metrics is at a lower level.28 possible reasons for this could be a lack of awareness of the importance of these metrics to increase the rg score, a lack of english language proficiency to participate in asking and answering questions, and a lack of time. however, more research is needed in this area. the results showed that most of the sbmu faculty members have followers. the mean number of followers of the faculty members is similar to what was found in other studies.29 as the number of followers increases, a person’s popularity in their subject area increases. they may even be influenced by the researcher’s studies in other areas and follow the researcher’s activities in researchgate and, with the increase of followers, there is a possibility of increasing the citation rate.30 therefore, it is recommended to elaborate on the importance and role of each of the rg metrics in raising the rg score through posters, workshops, and educational brochures for faculty members. in this study, the correlation between each of the rg metrics and rg score was examined using the spearman correlation test. the present study results revealed that the shared publication and number of followers metrics have a stronger correlation with the rg score compared to the metrics of questions and answers. the results also showed a significant correlation between all rg metrics and the rg score of sbmu faculty members. the study results indicated that most of the sbmu faculty members have citations in the scopus database and have an h-index, but most of them received the least number of citations. according to the present study findings, the subjects have little activity in the researchgate social network. as one of the possible reasons for the low number of citations, we can mention the low activity in the researchgate social network. research on surgeons’ publications has also confirmed this.31 nevertheless, there is a need for further research on the low number of citations of faculty members of sbmu. the present study’s findings demonstrated a significant relationship between the rg score and scopus citation metrics (h-index and number of citations). in this regard, the highest correlation was observed between the h-index and rg score (p-value = 0.803). this finding is consistent with other studies’ findings.32 there is also a significant relationship between each of the rg score metrics and scopus citation metrics. in this regard, the highest and lowest correlations with scopus citation metrics were observed between publication, questions, and answers metrics, respectively (p-value = 0.001). considering the positive relationship between each of the rg metrics and scopus citation metrics, it is suggested that faculty members pay enough attention to all of these metrics to help increase their citation indicators. due to the researchgate social network’s role in increasing the visibility of researchers’ scientific outputs, faculty members can consider the use of this network as one of the tools to increase the number of citations and the h-index. conclusion easy access to research outputs and increasing visibility is one of the most important features of researchgate, which, according to the results of this study, has a significant impact on increasing information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 10 the use and thus increasing citations. as the results revealed, researchers’ participation in the researchgate social network is effective in increasing citation indicators, including the number of citations and h-index. therefore, more activity in the researchgate social network, followed by receiving citations, can have favorable results in improving rankings for both research institutes and universities. universities can encourage faculty members to join and work in researchgate and other academic-social networks by considering privileges to improve their academic rank. libraries and research centers can explain the importance of faculty members’ activities in these networks by holding workshops on altmetric indicators and academic social network sites, especially researchgate. they can also justify to researchers the benefits of using this network and sharing more scientific outputs. declaration of interest: none funding: this work was supported by the school of allied medical sciences, shahid beheshti university of medical sciences, tehran, iran [grant number 28727]. the research ethics committee has approved this research under the ethical code number ir.sbmu.retech.rec.1400.310. endnotes 1 bart penders, “ten simple rules for responsible referencing,” plos computational biology 14, no. 4 (2018), https://doi.org/10.1371/journal.pcbi.1006036; b. s. lancho barrantes et al., “citation flows in the zones of influence of scientific collaborations,” journal of the american society of information science technology 63, no. 3 (2012): 481–89, https://doi.org/10.1002/asi.21682. 2 h. a. piwowar, r. s. day, and d. b. fridsma, “sharing detailed research data is associated with increased citation rate,” plos one 2, no. 3 (2007): e308, https://doi.org/10.1371/journal.pone.0000308. 3 stefano bortoli, paolo bouquet, and themis palpanas, “social networking: power to the people,” in papers presented in w3c workshop on the future of social networking position, january, barcelona (2009); brian kelly, “can linkedin and academia.edu enhance access to open repositories?”, impact of social sciences blog, 2012, https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhanceaccess-to-open-repositories/. 4 bortoli, bouquet, and palpanas, “social networking.” 5 nicole muscanell and sonja utz, “social networking for scientists: an analysis on how and why academics use researchgate,” online information review 41, no. 5 (2017): 744–59, https://doi.org/10.1108/oir-07-2016-0185. 6 stevan harnad, “publish or perish—self-archive to flourish: the green route to open access,” ercim news 64 (2006), http://eprints.ecs.soton.ac.uk/11715/1/harnad-ercim.pdf. 7 vala ali rohani and siew hock ow, “eliciting essential requirements for social networks in academic environments,” in 2011 ieee symposium on computers & informatics (ieee, 2011):171–76, https://doi.org/10.1109/isci.2011.5958905. https://doi.org/10.1371/journal.pcbi.1006036 https://doi.org/10.1002/asi.21682 https://doi.org/10.1371/journal.pone.0000308 https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhance-access-to-open-repositories/ https://blogs.lse.ac.uk/impactofsocialsciences/2012/08/23/linkedin-academia-enhance-access-to-open-repositories/ https://doi.org/10.1108/oir-07-2016-0185 http://eprints.ecs.soton.ac.uk/11715/1/harnad-ercim.pdf https://doi.org/10.1109/isci.2011.5958905 information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 11 8 elena giglia, “academic social networks: it’s time to change the way we do research,” european journal of physical and rehabilitation medicine 47, no. 2 (2011): 345–49. 9 rohani and hock ow, “eliciting.” 10 peter kraker and elisabeth lex, “a critical look at the researchgate score as a measure of scientific reputation,” (paper, quantifying and analysing scholarly communication on the web (ascw15), oxford, uk, june 30, 2015): 7–9, https://doi.org/10.5281/zenodo.35401. 11 webometrics ranking of world universities 2021, https://www.webometrics.info/en/asia/iran%20%28islamic%20republic%20of%29. 12 cwts leiden ranking 2018, https://www.leidenranking.com/ranking/2018/list. 13 richard van noorden, “the science that’s never been cited,” nature 552 (2017): 162–64, https://doi.org/10.1038/d41586-017-08404-0; rishabh shrivastava and preeti mahajan, “an altmetric analysis of researchgate profiles of physics researchers: a study of university of delhi (india),” performance measurement and metrics 18, no. 1 (2017): 52–66, https://doi.org/10.1108/pmm-07-2016-0033. 14 dennis h. lui et al., “contemporary engagement with social media amongst hernia surgery specialists,” hernia 21, no. 4 (2017): 509–15, https://doi.org/10.1007/s10029-017-1609-8; n. k. sheeja and susan k. mathew, “researchgate profiles of naval architecture scientists in india: an altmetric analysis,” library philosophy and practice (2019): 1–9, https://digitalcommons.unl.edu/libphilprac/2305/; muhammad yousuf ali and joanna richardson, “pakistani lis scholars’ altmetrics in researchgate,” program 51, no. 2, (2017):152–69, https://doi.org/10.1108/prog-07-2016-0052; preeti mahajan, har singh, and anil kumar, “use of snss by the researchers in india: a comparative study of panjab university and kurukshetra university,” library review 62, no. 8/9 (2013): 525–46; neil d. joshi et al., “social media in neurosurgery: using researchgate,” world neurosurgery 127 (2019): e950– e956, https://doi.org/10.1016/j.wneu.2019.04.007; maliheh nikkar, rahim alijani, and hamid ghazizadeh khalifeh mahaleh, “investigation of the presence of surgery researchers in research gate scientific network: an altmetrics study,” iranian journal of surgery 25, no. 2, (2017): 76–82; maryam rahmani et al., “rg score compared with h-index: a case study,” sciences and techniques of information management 4, no. 2 (2018): 61–76, http://stim.qom.ac.ir/article_1139_en.html. 15 nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 16 rahmani et al., “rg score.” 17 sheeja and mathew, “researchgate.” 18 yusuf ali and richardson, “pakistani.” 19 mahajan, singh, and kumar, “use of snss”; joshi et al., “social media”; h lui et al., “contemporary.” 20 joshi et al., “social media.” https://doi.org/10.5281/zenodo.35401 https://www.webometrics.info/en/asia/iran%20%28islamic%20republic%20of%29 https://www.leidenranking.com/ranking/2018/list https://doi.org/10.1038/d41586-017-08404-0 https://doi.org/10.1038/d41586-017-08404-0 https://doi.org/10.1108/pmm-07-2016-0033 https://doi.org/10.1007/s10029-017-1609-8 https://digitalcommons.unl.edu/libphilprac/2305/ https://doi.org/10.1108/prog-07-2016-0052 https://doi.org/10.1016/j.wneu.2019.04.007 https://doi.org/10.1016/j.wneu.2019.04.007 http://stim.qom.ac.ir/article_1139_en.html information technology and libraries march 2022 researchgate metrics’ behavior | valizadeh-haghi, nasibi-sis, shekofteh, and rahmatizadeh 12 21 rishabh shrivastava and preeti mahajan, “relationship amongst researchgate altmetric indicators and scopus bibliometric indicators: the case of panjab university chandigarh (india),” new library world (2015), https://doi.org/10.1108/nlw-03-2015-0017. 22 farahnaz naderbeigi and alireza isfandyari-moghaddam, “researchers’ scientific performance in researchgate: the case of a technology university,” library philosophy & practice (2018), https://digitalcommons.unl.edu/libphilprac/1752. 23 hamed nasibi-sis, saeideh valizadeh-haghi, and maryam shekofteh, “researchgate altmetric scores and scopus bibliometric indicators among lecturers,” performance measurement and metrics 22 no. 1 (2020):15–24, https://doi.org/10.1108/pmm-04-2020-0020. 24 iranian scientometrics information database, https://isid.research.ac.ir/, accessed december 26, 2021. 25 yusuf ali and richardson, “pakistani”; maryam janmohammadi, maryam rahmani, and zahra rootan, “review of rg indices and ranking of researchers in research gate: case study: faculty of veterinary medicine, university of tehran,” in proceedings international interactive information retrieval conference (tehran, 2017); rahmani et al., “rg score”; nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 26 rahmani et al., “rg score”; naderbeigi and isfandyari-moghaddam, “researchers”; janmohammadi, rahmani, and rootan, “review of rg”; shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship.” 27 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship”; janmohammadi, rahmani, and rootan, “review of rg.” 28 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship.” 29 shrivastava and mahajan, “an altmetric analysis”; shrivastava and mahajan, “relationship”; naderbeigi and isfandyari-moghaddam, “researchers”; janmohammadi, rahmani, and, rootan, “review of rg.” 30 shrivastava and mahajan, “an altmetric analysis.” 31 nikkar, alijani, and ghazizadeh khalifeh mahaleh, “investigation.” 32 sheeja and mathew, “researchgate”; joshi et al., “social media”; rishabh and mahajan, “relationship”; naderbeigi and isfandyari-moghaddam, “researchers.” https://doi.org/10.1108/nlw-03-2015-0017 https://digitalcommons.unl.edu/libphilprac/1752 https://doi.org/10.1108/pmm-04-2020-0020 https://isid.research.ac.ir/ abstract introduction literature review material and methods results discussion conclusion endnotes editorial | truitt 87 l ife out of balance. those who saw it will surely recall the 1982 film that juxtaposed images of stunning natural beauty with scenes of humankind’s intrusion into the environment, all set to a score by philip glass. the title is a hopi word meaning “life out of balance,” “crazy life,” “life in turmoil,” “life disintegrating,” or “a state of life that calls for another way of living.” while the film, as i recall, relied mainly on images of urban landscapes, mines, power lines, etc., to make its point about our impact on the world around us, it did include as well images that had a technological focus, even if the pre–pc technology exemplars shown may seem somewhat quaint thirty years later.1 the sense that one is living in unbalanced, crazy, or tumultuous times is nothing new. indeed, i think it’s fair to say that most of us—our eyes and perspectives firmly and narrowly riveted to the here and now—tend to believe that our own specific time is one of uniquely rapid and disorienting change. but just as there have been, and will be, periods of rapid technological change, social upheaval, etc.—“been there, done that, got the t-shirt,” to recall the memorably pithy, if now slightly oh-so-aughts, slogan—so too have there been reactions to the conditions that characterized those times. a couple of very different but still pertinent examples come to mind. in the second half of the nineteenth century, a reaction against the social conservatism and shoddy, mass-produced goods of the victorian era began in england. inspired by writer and designer william morris, the arts and crafts movement emphasized simplicity, hand-made (as opposed to factory-made) objects, and social reform. by the turn of the century, the movement had migrated to the united states—memo to self: who were the leading lights of the movement in canada?—finding expression in the “mission-style” furniture of gustav stickley, the elegant art pottery of rookwood, marblehead, and teco, and the social activism of elbert hubbard’s roycrofters. fast-forward another half-century to the mid-1960s and the counter-culture of that time, itself a reaction to the racism, sexism, militarism, and social regimentation of the preceding decade. for a brief period, experimentation with “alternative lifestyles,” resistance to the vietnam war, and agitation for social, racial, and sexual change flourished. whatever one’s views about, say, the flower children, civil rights demonstrations, or the wisdom of u.s. involvement in vietnam, it’s well-nigh impossible to argue that the society that emerged from that time was not fundamentally different from the one that preceded it. that both of these “movements” ultimately were subsumed into the larger whole from which they sprang is only partly the issue. and my aim is not to romanticize either of these times, even as i confess to more than a passing interest in and sympathy for both. rather, my point is that their roots lay in a reaction to excesses—social, cultural, economic, political, even technological—that marked their times. they were the result of what might be termed “life out of balance.” in turn, their result, viewed through a longer lens, was a new balance, incorporating elements of the status quo ante and critical pieces from the movements themselves. thesis —> antithesis —> synthesis. we find ourselves in such unbalanced times again today. even without resort to over-hyped adjectives such as “transformational,” it is fair to say that we are in uncertain times. in libraries, budgets, staffing levels, and gate counts are in decline. the formats and means of information delivery are rapidly changing. debates rage over whether we are merely in the business of delivering “information” or of preserving, describing, and imparting learning and knowledge. perhaps worst of all, as our role in the society of which we are a part changes into something we cannot yet clearly see, we fear “irrelevance.” what will happen when everyone around us comes to believe that “everything [at least, everything that’s important] is on the web” and that libraries and librarians no longer have a raison d’être? for much of the past decade and a half—some among us might argue even longer—we’ve reacted by taking the rat-in-the-wheel approach. to remain “relevant,” we’ve adopted practically every new fad or technology that came along, endlessly spinning the wheel faster and faster, adopting the tokens of society around us in the hope that by so doing we would stanch the bleeding of money, staff, patrons, and our own morale. as i’ve observed in this space previously,2 we’ve added banks of über-connected computers, clearing away book stacks to design technology-focused creative services and collaborative spaces around them. we’ve treated books almost as smut, to be hidden away in “plain-brown-wrapper” compact storage facilities. we’ve reduced staffing, in the process outsourcing some services and automating others so that they become depersonalized, the library equivalent of a bank automated teller machine. we’ve forsaken collection building, preferring instead to rent access to resources we don’t own and to cede digitization control of those resources that we ostensibly do own. where does it end? in a former job, i used to joke that my director’s vision of the library would not be fully realized until no one but the director and the library’s system administrator were left on staff and nothing but a giant super-server remained of the library. it seemed only black humor then. today it’s just black. marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. editorial: koyaanisqatsi 88 information technology and libraries | september 2011 and intellectual rest. they are places of the imagination. play to these strengths. those seeking to reimagine library spaces as refuges could hardly do better than to look to jasper fforde’s magical bookworld in the thursday next series for inspiration.3 stuffy academics and special libraries take note: library magic is not something restricted to children’s rooms in public libraries. walk through the glorious spaces of yale’s sterling memorial library or visit the reading room at the university of alberta’s rutherford library—known to the present generation of students as the “harry potter room,” for its evocation of the hogwarts school’s great hall—and then tell me that magic does not abound in such places. it’s present in all of our libraries, if we but have eyes to see and hearts to feel. ■■ the library was once a place for the individual. to contemplate. to do research. to know the peace and serenity of being alone. in recent years, as we’ve moved toward service models that emphasize collaboration and groups, i think we’ve lost track of those who do not visit us to socialize or work in groups. we need to reclaim them by devoting as much attention to services and spaces aimed at those seeking aloneness as we do at those seeking togetherness. the preceding list will probably brand me in the minds of some readers as anti-technology. i am not. after spending the greater part of my career working in library it, i still can be amazed at what is possible. “golly? we can do that?” but i firmly believe that library technology is not an end in itself. it is a tool, a service, whose purpose is to facilitate the delivery of knowledge, learning, and information that our collections and staff embody. nothing more. that world view may make me seem old fashioned; if such be the case, count me proudly guilty. in the end, though, i come back to the question of balance. there was a certain balance in and about libraries that prevailed before the most recent waves of technological change began washing over libraries a couple of decades ago. those waves disrupted but did not destroy the old balance. instead, they’ve left us out of balance, in a state of koyaanisqatsi. it’s time to find a new equilibrium, one that respects and celebrates the strengths of our traditional services and collections while incorporating the best that new technologies have to offer. it’s time to synthesize the two into something better than either. it’s time for balance. references and notes 1. wikipedia, “koyaanisqatsi,” http://en.wikipedia.org/ wiki/koyaanisqatsi (accessed july 12, 2011). ital readers in the united states can view the entire film online at http://www more importantly, where has all this wheel spinning gotten us, other than continued decline and yet more hand-wringing and anguish about irrelevance? it’s time to recognize that we are living in a state of koyaanisqatsi (life out of balance). and it’s up to us to do something new about it by creating a new balance. here are a few perhaps out-of-the-box ideas that i think could help with establishing that balance. spoiler alert: some of these may seem just a bit retro. i can’t help it: my formative library years predate the chicxulub asteroid impact. anyway, here goes: ■■ cease worrying so about “relevance.” instead, identify our niche: design services and collections that are “right” and uniquely ours, rather than pale reflections of fads that others can do better and that will eventually pass. we are not google. we are not starbucks. we know that we cannot hope to beat these sorts of outfits at their games; perhaps less obvious is that we should be extremely wary of even partnering with them. their agenda is not ours, and in any conflict between agendas, theirs is likely to prevail. we must identify something unique at which we excel. ■■ find comfort in our own skins. too many of us, i sense, are at some level uneasy with calling ourselves “librarians.” perhaps this is so because so many of us came to the profession by this or that circuitous route, that is, that we intended to be something else and wound up as librarians. get over it and wear the sensible shoes proudly. ■■ stop trying to run away from or hide books. they are, after all, perceived as our brand. is that such a bad thing? ■■ quit designing core services and tools that are based on the assumption that our patrons are all lazy imbeciles who will otherwise flee to google. the evidence suggests that those folks so inclined are already doing it anyway; why not instead aim at the segment that cares about provision of quality content and services—in collections, face-to-face instruction, and metadata? people can detect our arrogance and condescension on this point and will respond accordingly, either by being insulted and alienated or by acting as we depict them. ■■ begin thinking about how to design and deliver services that are less reliant on technology. technology has become, to borrow from marx, the opiate of libraries and librarians; we rely on it to the exclusion of nontechnological approaches, even when the latter are available to us. technology has become an end in itself, rather than a means to an end. ■■ libraries are perceived by many as safe harbors and refuges from any number of storms. they are places of rest—not only of physical rest, but of emotional editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries article alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries miriam e. sweeney and emma davis information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12363 miriam e. sweeney (mesweeney1@ua.edu) is associate professor, university of alabama. emma davis (edavispatsfan@gmail.com) is library specialist, hoover public library. abstract smart voice assistants have expanded from personal use in the home to applications in public services and educational spaces. the library and information science (lis) trade literature suggests that libraries are part of this trend, however there are few empirical studies that explore how libraries are implementing smart voice assistants in their services, and how these libraries are mitigating the potential patron data privacy issues posed by these technologies. this study fills this gap by reporting on the results of a national survey that documents how libraries are integrating voice assistant technologies (e.g., amazon echo, google home) into their services, programming, and checkout programs. the survey also surfaces some of the key privacy concerns of library workers in regard to implementing voice assistants in library services. we find that although voice assistant use might not be mainstreamed in library services in high numbers (yet), libraries are clearly experimenting with (and having internal conversations with their staff about) using these technologies. the responses to our survey indicate that library workers have many savvy privacy concerns about the use of voice assistants in library services that are critical to address in advance of library institutions riding the wave of emerging technology adoption. this research has important implications for developing library practices, policies, and education opportunities that place patron privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies. introduction smart voice assistant use has expanded from personal uses in the home to new applications in customer services, healthcare, e-government, and educational spaces, raising questions from groups like the american civil liberties union (aclu), among others, about the data privacy implications of these technologies in public and shared spaces.1 libraries are part of the voice assistant adoption trend, as documented in the american libraries magazine article “your library needs to speak to you” by carrie smith.2 smith gives examples of school, public, and academic libraries adopting smart voice assistants like amazon’s alexa and echo devices for a range of services and programming including “event calendars, catalog searches, holds, and advocacy.” nicole hennig points out that there are tremendous opportunities for voice assistants to assist “people with disabilities, the elderly, and people who can’t easily type.”3 in these ways, voice assistants are often presented in the trade literature as part of an exciting new wave of emerging smart technology services that libraries can “get ahead of” and potentially harness for public service and community engagement. at the same time, the key privacy issues inherent in voice assistants are often downplayed as secondary concerns while librarians are encouraged to press forward and experiment with smart technology adoption. we argue that the privacy concerns surrounding voice assistant use in libraries should be treated as fundamental questions for library mailto:mesweeney1@ua.edu mailto:edavispatsfan@gmail.com information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 2 workers to consider as a part of upholding the core values of patron privacy and confidentiality in library services. voice assistant use in libraries is still nascent, reflecting the emerging nature of these technologies. given this, it is not surprising that very few empirical studies have explored voice assistant use and potential data privacy implications for libraries. our research is intended as an exploratory study that contributes to advancing knowledge in this area. the goals of this study are to begin mapping smart voice assistant use in libraries, to assess how aware library workers are of privacy concerns involving these technologies, and document how library workers are educating patrons about privacy and voice assistant use. these are necessary first steps for developing library practices, policies, and education opportunities for voice assistant use that prioritize privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies and diminishing data privacy protections. review of literature what is a voice assistant? voice assistants are a type of digital assistant technology, also known as virtual assistants, and can be broadly defined as computer programs designed with human characteristics that act on behalf of users in digital environments using voice interfaces.4 apple’s siri, microsoft’s cortana, and amazon’s alexa are prevalent examples of smart digital assistants that use voice recognition and natural language user interfacing to help learn users’ preferences, answer questions, and manage a variety of applications and personal information. voice assistants can run on multiple devices and be seamlessly integrated across platforms including networked internet of things (iot) gadgets like smart speakers (e.g., amazon echo and google home) and other smart-home technologies (e.g., nest or ring), along with mobile devices, smart watches, personal computers, and numerous third-party applications. ubiquitous “always on” features are offered as a convenience to users who can use “wake words” (e.g., “hey, siri”; “alexa”; “ok google”) to initiate queries and commands. amazon’s smart speakers and intelligent digital assistants are rapidly becoming pervasive home and personal technologies, with the amazon echo leading the market in 2019 with 61 percent market share, followed distantly by the google home device with 24 percent market share.5 a recent united states survey by clutch reported that nearly half of people surveyed owned a voice assistant, with one-third planning to purchase one in the next three years.6 additionally, the clutch survey found that 69 percent of voice assistant owners used their devices every day.7 the popularity of voice assistants for personal use has driven the expansion of these technologies for customer service applications outside of the home in shared and public spaces, including in educational settings and health care. in this landscape it is perhaps not surprising that librarians are following suit and exploring the service potentials of voice assistants for libraries. libraries and voice assistant use the american library association’s (ala) center for the future of libraries initiative identified “voice control” as a trend in their 2017 report, anticipating the relevance of voice assistant technologies for libraries.8 the capability of voice assistants to integrate across platforms through customized applications—which amazon calls “skills” and google refers to as “actions”—allows libraries to create specialized uses for these technologies as a part of their regular information services. additionally, existing third-party vendors like overdrive (for e-book lending) and hoopla (multimedia lending) that most public libraries use are preconfigured to connect to voice information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 3 assistants like amazon’s alexa. there are many creative and potentially helpful ways that voice assistants could be integrated into the library setting, including enhancing read-along with music and effects, providing accessible services for elderly patrons or individuals with disabilities, and providing an alternative access point for common library queries and institutional information (e.g., searching titles, placing holds, requesting library event information).9 some libraries have started experimenting with voice assistant services in the library. for example, iowa state university staff developed alexa skills for their library so that users could find out information about library history and library collections.10 other libraries are using voice assistants to strategically engage their communities, as when the spokane public library placed amazon echo dots in the library so patrons could ask questions about upcoming bond elections, an issue that directly impacts library funding.11 the worthington (oh) libraries are integrating voice assistant technologies into technology training and “petting zoo kits” which allow their patrons to try out emerging technologies.12 the king county (wa) library system is taking a novel approach and experimenting with developing their own voice assistant, libro.13 these examples point to the many applications and creative approaches libraries are experimenting with to bring voice assistant technology to their services. data privacy issues as convenient as voice assistants may be for library services, the underlying data infrastructures of these technologies are tightly controlled by the technology companies that design and sell them. the lack of library control (and transparency) over these infrastructures raises questions about how the core values of privacy and confidentiality can be guaranteed in the library setting. 14 voice assistant technologies capture a wide range of intimate user information in the form of biometric data (e.g., voice recognition), consumer habits, internet-based transactions, personally identifiable information (pii), and geographical information.15 the ubiquitous “always on” feature that makes these technologies so convenient also flags important privacy questions about the extent of user interactions that are recorded; how these files are processed, transcribed, and stored; and how local, state or other law enforcement agencies might compel or otherwise use these records.16 recently amazon has confirmed that they have employees dedicated to listening to recordings from echo devices in order to help “eliminate the gaps in alexa’s understanding of human speech and help it better respond to commands,” which is concerning for patron privacy in the library context.17 researchers at northeastern university and imperial college london recently did a study about how often smart speakers record “accidentally” and whether or not they are constantly recording. the study found no evidence to support the theory that these devices are constantly recording, however the researchers did report that smart speakers are accidentally activated around 19 times a day, on average. 18 these reports aside, there is still much unknown about what these companies, and the companies they contract out work to, do with the personal data collected from voice assistants. lastly, amazon is a known collaborator with us government agencies like homeland security and immigration and customs enforcement (ice), hosting their biometric data on amazon web services (aws).19 amazon has a reputation for being one of the least transparent technology companies in terms of data sharing practices, and has routinely evaded questions about if/how much of customers’ echo data has been turned over to federal authorities.20 given this data environment, the fact that libraries are beginning to experiment with voice assistant integration in their services poses important questions for patron data privacy and confidentiality. ala provides library privacy guidelines for third-party vendors that clearly detail information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 4 expectations for use, aggregation, retention, and disclosure of user data. 21 while this document has been helpful for guiding license agreements with digital content providers, program facilitators, and other libraries, it does not quite capture the range of complexities that emerging smart technologies pose in the app-driven iot landscape. this area is ripe for study and having more information about how libraries of different types are approaching using voice assistants is necessary for developing responsive professional practices that center issues of privacy and critical digital literacy. our survey explores some of these issues with the purpose of beginning to document voice assistant use, and associated privacy concerns, in library services. research methods four main research questions guide this study: (1) how are libraries using smart voice assistant technologies as a part of their library services? (2) how aware are library workers of how voice assistants integrate with third-party digital content platforms? (3) are libraries educating library patrons about the privacy implications of smart voice assistant technologies? and (4) what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? to address these questions, we developed an online survey using qualtrics web software, and distributed it in fall 2019 to 1,929 public and academic libraries across the us via email solicitation.22 the survey consisted of a mix of 31 multiple choice and open-ended questions designed to address different aspects of the stated research questions (see appendix a). since most of the examples of library voice assistant use detailed in the lis trade literature came from public and academic libraries, these were the library types we identified as most likely to already be experimenting with voice assistants in services and programming. using purposive sampling techniques, we selected 30 public libraries for each state that represented a range of rural and metropolitan service areas. we selected approximately 10-20 academic libraries per state, the actual numbers ranging based on the total number of universities and colleges in a given state. we identified a cross-section of large state schools, private colleges, and community colleges in each state to account for the variety of higher education institutional settings for academic libraries. we sent email solicitations to each public library, targeting email addresses for library directors where possible. for libraries that had centralized email services, we solicited participation using the contact forms available on the libraries’ websites. email solicitations to the academic libraries targeted library employees with job titles that included: emerging technology, user services, user experiences, head of public services, and head of technology. our survey analysis documents the numbers of reported uses, and kinds of integration, of voice assistant technologies across library applications and services. we conducted a qualitative content analysis of the short answer responses, with both researchers independently coding participant comments for emergent themes and categories. as a part of this process both researchers compared and negotiated categories in two iterations of coding to arrive at a common codebook which was then applied in the final pass of the responses. these categories have some distinct features, but also have many overlapping components. comments that embodied multiple themes were included in all categories that were relevant for describing them, meaning a particular comment might be included in multiple categories. the following sections report on the key findings of this study, organizing the discussion around our original research questions. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 5 findings participant demographics we received 86 total responses for the survey, with the majority of respondents (61 percent) reporting affiliation with public libraries, followed by respondents from academic libraries (38 percent), with one respondent from a school library (1 percent).23 the participants represented libraries from 42 states across the us.24 the vast majority of public library respondents (65 percent) reported serving populations of 25,000 or more, though there was also a large reporting from libraries serving smaller populations of 2,500-9,999. the majority of academic library respondents work for small and medium sized institutions serving populations between 2,5009,999 (table 1), with nearly a third of respondents representing medium to large institutions. admittedly, these are rough demographic sketches to help quickly identify which types of libraries might be using voice assistants. more granular demographic detail would be useful in future studies to further understand how factors like institution type, geographical region, access to resources, and service community demographics shape decisions about emerging technology adoption in libraries. table 1. size of service population by library type total public academic school total count 84 51 32 1 2,5000 or less 11.9% 2.0% 28.1% 0.0% 2,500-9,999 25.0% 19.6% 34.4% 0.0% 10,000-25,000 16.7% 11.8% 25.0% 0.0% 25,000+ 44.0% 64.7% 12.5% 0.0% i’m not sure. 2.4% 2.0% 0.0% 100.0% how are libraries using smart voice assistant technologies as a part of their library services? only five respondents (6 percent) in our study reported that their library is currently using amazon echo, google home, or apple siri devices for patron services and programming. of the voice assistant adopters, three were public libraries using amazon echo and google home devices, and two were academic libraries using amazon echo and apple siri (table 2). table 2. voice assistant device by library type total public academic school amazon echo 3 1 2 0 google home 2 2 0 0 apple siri 1 0 1 0 librarians described using voice assistants to “provide basic info about the library and resources,” and on an “ad hoc basis” to promote the library-specific alexa skills and google home actions. other reported uses included “translation services” and as a part of “technology petting zoos.”25 we asked librarians to describe where voice assistants were located in the library to get a better idea of the spatial arrangements of these technologies, which could be important for considering potential surveillant concerns. several libraries reported that they had voice assistants sitting at front service desks or reference desks for patrons to use in both adult and children’s service areas, information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 6 as well as at circulation desks. as one librarian described, “we are mounting it [the voice assistant] so students/users can ask questions when necessary.” when it comes to using these devices in library programming, the most common response was for use in technology petting zoos and in technology classes where patrons can see technology demonstrations and ask library staff questions, or get on-on-one tutoring sessions: “our technology department holds regular ‘tech drop-in's’ and carries out one on one assistance by appointment. in the context of these patrons will sometimes bring in their own devices or ask questions about the use of digital assistants.” other programming applications that librarians mentioned for voice assistants included trivia, 3 -d printing, and makerspaces. two libraries (one public and one academic) reported that they were circulating apple siri devices (e.g., ipads) and amazon alexa products (e.g., echo) for checkout. how aware are library workers of how voice assistants integrate with third-party digital content platforms? the majority of library workers surveyed (70 percent) reported that their libraries use third-party digital media platforms like overdrive and hoopla to provide multimedia content like e-books and streaming video to patrons. both of these platforms support integration with voice assistants like amazon alexa through “skills” (the alexa equivalent of an application). patrons are able to download a skill for their alexa-enabled device to access digital content through these platforms, which are often linked to their library accounts (e.g., “alexa, ask hoopla how many borrows i have remaining.”).26 around 14 percent of the respondents reported that they were aware that overdrive and hoopla integrated with voice assistants, and 3 percent of all respondents reported that their libraries actively inform patrons about amazon alexa skills for these services. when patrons begin connecting their personal voice assistant devices with third -party digital content providers that are also linked to their library accounts, different terms of service agreements and privacy policies overlap creating a complex data rights landscape. almost a third of our respondents (29 percent) replied that they were aware that amazon has different privacy policies from overdrive and hoopla, with 22 percent responding that they were unaware of these differences (the rest were unsure or did not respond). only 15 percent of respondents reported that their libraries provided patrons with information about overdrive and hoopla’s privacy policies. one library worker offered that, “when helping a patron or informing them that we use overdrive they are encouraged to read all the privacy info.” however, no libraries in this study reported sharing information about amazon’s privacy policies with patrons, which might also apply to linked accounts. lastly, 34 percent of the library workers indicated that they were familiar with the ala guidelines on privacy that pertain to third-party vendors, and 16 percent reported that their library actively refers to these guidelines in information materials for patrons. for instance, “we have a privacy policy on our website, which was based on the ala library privacy checklists. it states that our vendors have different privacy policies than we do.” these responses indicate that while some library workers are aware of the privacy implications of the integration of voice assistants into third-party digital content platforms, there are opportunities to increase staff and patron awareness about the intersecting privacy policies and terms of service in this landscape. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 7 are libraries educating library patrons about voice assistant technologies as a part of services and programming? we were curious if the libraries who used voice assistants in their services were taking any particular measures to inform patrons about the privacy implications of these technologies, or offering any other kinds of specific privacy “best practices” guides for use (e.g., how to erase your data records, adjust settings, etc.). the two libraries who reported circulating voice assistants indicated that they did not include any privacy information with voice assistant devices at checkout. similarly, we asked library workers about the kinds of technology classes or programming that their libraries were offering, since these might be sites where there is potential to educate or provide information about privacy issues raised by smart technologies like voice assistants. we found that 49 (56 percent) of the libraries represented in the survey (37 public, 12 academic) offer technology courses for the public. of these, 39 libraries (24 public, 15 academic) responded “yes” to our question asking if aspects of “data privacy or data literacy” are included as part of these classes or other related programming.27 only 3 libraries (2 public, 1 academic) were able to report that their library offers data literacy education that specifically addresses voice assistant technologies. library workers provided many examples of the kinds of data literacy information that their libraries typically provided in technology classes and programming. twelve respondents said that their libraries offered some sort of broad data literacy class and several cited classes specifically targeted at personal data practices and security. topics taught in these classes included: understanding your personal risk profile; password managers and security; how to understand and protect your digital footprint; and sessions on facebook and google where staff “walk users through how to find their information and make decisions about it.” several respondents identified information literacy topics in conjunction with data literacy, noting that their library teaches classes about identifying “fake news,” phishing scams, and evaluating the authority of websites and website content. none of the responses specifically named issues around privacy or data capture by voice assistants or other smart technologies as topics covered in library technology classes. several library workers noted that technology classes were offered at their libraries through one-on-one sessions, geared to individually address what patrons had questions about. based on these responses it is unclear how in-depth, or if at all, these one-on-one sessions might go into informing patrons about privacy best practices and risks when using smart technologies like voice assistants. what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? just over half of the library workers surveyed (52 percent) answered “yes” to the question: “do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library?” of the other responses, 16 percent reported “no” concerns and 15 percent answered “i’m not sure.” those who answered yes were asked to further describe their privacy concerns, resulting in robust descriptions that demonstrated a savvy understanding of the voice assistant data landscape. we characterized library workers’ concerns about voice assistants in the library by five major categories: data access and use; surveillance and “always on” features; procedure and operations; legal issues; and professional responsibility. data access and use by far the most prevalent privacy concerns focused on questions about who has access to data collected by smart voice assistants and how this data might be used (or misused) by different information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 8 parties. library workers were the most concerned about the reach of access that the three major voice assistant parent companies (amazon, google, apple) have to patron data, closely accompanied by concerns with the selling of this data to third-party vendors: “there are known risks in the logging practices of the assistant vendor (amazon, google, apple). there are potentially greater, and unknown, risks of privacy and data security problems with third-party integrators that libraries are working with to create the alexa skills, google home actions, etc.” “these devices are tied to user accounts for vendors that sell goods and services. there are opportunities to make purchases that we do not want to present to our patrons.” “as currently constituted, most of these devices' privacy policies require owners to allow voice recordings to be sent to cloud services for transcription and, in some instances, for storage and for re-listening by staff or 3rd-party contractors.” another library worker added that they were concerned about the willingness of these parent companies to “share personal, private data with law enforcement agencies.” this observation underscores what is potentially at stake in terms of patron vulnerability in this data environment. several concerns focused on patrons “unwittingly leaving their sensitive information on devices that we might use.” “being that anything we use in the library, or check out to our patrons is shared, i have privacy concerns for what data and recordings will be collected by the services while they are either in use in the library or while they are in the patron’s possession.” while some of these concerns were tied back to how parent companies might use this data, others were equally wary of the potentials for “storing information that can be accessed between patron uses” or by library staff: “as with computers in the info commons, i would be concerned whether user information is scrubbed after each user. or would one user's information persist and become available to a subsequent user.” “i would not want to be able to identify the patron who used the device. in this case, we cannot. we circulate ipads as assistive devices. as soon as the item is returned, the checkout record is purged.” lastly, library workers expressed cybersecurity concerns about voice assistants, wondering about how voice assistants might be hacked or otherwise manipulated by malicious actors: “the library is public space, these devices are not known for being secure. a device would have to be registered to some university account, but would be prone to algorithmic manipulation from public voice inputs if that makes sense?” “just the idea that they (everything!) is [sic] hackable, and hostage-able, and so on, creeps me out personally, but also in terms of privacy and confidentiality of users of that technology.” information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 9 “alexa and google home can be hacked to phish passwords and other sensitive information.” taken together, these concerns gesture to the opacity of the data environment in terms of who might have access to data (companies, law enforcement, patrons, library staff, hackers) and how this data might be used (advertising and marketing, exposure of personal patron information, state surveillance, and exploitation). surveillance and “always on” features the second major area of concern that library workers expressed was about the surveillance potentials of voice assistants via their passive listening features. in order for voice assistants to respond to their various wake words, they need to be “always on” and listening. while there is a difference between always listening and always recording (which recent studies suggest is no t happening), library workers remained wary about devices “constantly monitoring staff or patrons.”28 these concerns have some obvious overlap to the data access and use theme, but differ in that they are specifically concerned with the act of surveilling—monitoring—patron activities, use patterns, and personal information. three respondents in this category couched their data privacy concerns in terms of ability to exert some control over their data (e.g., deleting data), or the ability to grant permission/consent to be recorded: “these devices are intended for use in the home. they offer some protections for users with management access. for example, the google assistant allows review and deletion of recording history. for users without such access there are no such protections.” “...they [voice assistants] are intended to for use inside a single household, learning the voices, habits and preferences of those household members. i feel that this kind of personal information should be the individual's choice to make and not the library's [sic].” “my concern is that my personal data is being collected without my permission. the same concern applies to patrons of the library. having them present and turned on captures people's conversations and they may not be aware that is happening.” as these comments suggest, passive listening in public spaces opens up the potential for surveilling patrons and library staff who are not intending to interact with the devices, or who have no knowledge that the device is present. in other words, while some patrons might opt to use a voice assistant to ask a question or look a book up in a library catalog, patrons (and library staff) who are merely talking in the vicinity of these devices may still be listened to and recorded by these devices without their knowledge or consent. this group of privacy concerns conveys a lack of transparency around data collection and surveillance in voice assistants, pointing to larger power differentials between parent companies and users in terms of control over data collection and management. procedure and operations library workers discussed the operational challenges that voice assistants present to staff in terms of establishing routine procedures that ensure patron privacy and confidentiality in between patron use: information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 10 “how do we make sure no residual information remains in the device before someone else uses it or that if used during a program 'private' information isn't being broadcast to other devices in the area?” another library worker alluded to some of the operational considerations that already accompany library use and lending of personal computing devices, “clearing data, purchasing, maintaining, we already have ipads and other devices and their management with our staff has been a challenge.” this comment points to the extra staff labor that underpins technology services, which is often not considered as a part of infrastructure for offering these services. similarly, there is a sense from these comments that establishing procedures to maintain privacy and confidentiality are critical for voice assistants. failure to erase or secure patron data could lead to inadvertently exposing sensitive or personally identifiable information (pii). “patron's [sic] may inadvertently be saving their information or staff may forget to delete information causing the previous patrons sensitive information to remain for the next patron to discover.” while google home and amazon alexa devices do provide the ability for individual recordings to be deleted by the account holder, in the case of shared library use of voice assistants, it would likely be incumbent on a library staff member to access and delete recordings. this raises ethical, legal, and operational questions for library staff required to manage any patron data collected by voice assistants. in any case, procedural concerns are a reminder that library staff have an active role to take in ensuring patron privacy. legal issues library workers in this study identified three legal issues posed by voice assistants in the library. the first legal issue raised was the potential for violation of the family educational rights and privacy act (ferpa)—the federal law that protects the privacy of student education records—due to the collection of pii by voice assistants. library workers in many academic settings are required to maintain compliance with ferpa. one of the respondents was concerned that by using voice assistants in their services, libraries would be putting themselves in a position to potentially violate this law. a second set of concerns focused on questions about the liability of the library (or individual library workers) if a patron’s pii is misused by technology companies or the third-party vendors who have access to user data: “i have great concerns regarding the use of this technology in a library setting since it might expose the library to potential liability if, more likely when patron data is misused by the technology providers.” related to this concern, another library worker asked, “who owns the info?” questions about rights and ownership of personal data by technology companies, itself a fraught and opaque legal area, require more ethical and legal probing as libraries become intermediaries to patron use of voice assistants. lastly, one library worker cited concerns about librarians’ ability to uphold first amendment rights with voice assistants. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 11 “we take our mandated role to uphold first amendment rights and patron privacy very seriously. there are too many issues with the way these for-profit companies collect, store and potentially use information. we see no benefits of service gained that offset these concerns. we are also concerned about the way owners of these products use their wealth to leverage political influence.” this comment identifies privacy as a necessary condition for facilitating free speech, contrasting this with a sketch of the political and economic motives underlying voice assistant development. the concerns raised by these library workers point to the complexity of managing patron data in the context of a variety of existing legal frameworks. professional responsibility three respondents explicitly placed privacy concerns in the context of their professional responsibility as library workers to “protect” patrons and patron privacy. a fourth respondent voiced a twin concern about “the library's inability to protect privacy and patron information” (emphasis added). beyond descriptions of protecting patrons, these library workers framed privacy as a professional value. comments such as, “we take our mandated role to uphold first amendment rights and patron privacy very seriously,” emphasize privacy as a professional charge. these kinds of comments tacitly draw on lis professional core values and ethics statements to position responsible professional practice as the action of upholding privacy. as a result, professional identity is discursively constructed by these library workers as a function of valuing privacy. the following comment, particularly, draws an identity-based line between “us” (library professionals) and “them” (technology companies) that is based on divergent values surrounding privacy: “since one of the main concerns we (should) have as library professionals is patron privacy; ‘teaming up’ with technology providers who do not have that level of concern is problematic at best.” the assertion that library core values may be in conflict with the technology providers that are designing voice assistants is very astute, and important for libraries to consider when weighing the decision to experiment with these (and other) emerging smart technologies. discussion: key considerations for library professionals our research suggests that library use of voice assistants poses many as-of-yet unresolved privacy issues for library staff and patrons alike. though voice assistant use is still fairly nascent across public and academic libraries, our study confirms that these tools are already being adopted by some libraries. the adoption of these, and other, smart technologies, is likely to keep trending in library services across institution types, paralleling market trends for personal adoption of voice assistants. many library workers in our study expressed astute concerns about voice assistants, raising important questions about how patron data was collected, managed, and used across the data lifecycle of these technologies. this is a critical moment, then, for the library profession to take stock of questions of privacy surrounding voice assistants, and an opportunity to set a broader professional agenda for data-privacy that encompasses the complexities of smart technology use in library services. in this spirit, we have identified several main areas of concern that emerged from our study, posited as key considerations about voice assistants for library professionals to grapple with. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 12 circulation procedures for libraries who are, or are considering, lending voice assistant-enabled technologies, clear lending rules are needed for patrons that set guidelines for disconnecting their personal amazon, apple, or google accounts before returning the device. likewise, it is important to develop procedures for library staff to anticipate instances when patrons forget to disconnect their personal accounts. library workers cannot, and should not, be responsible for disconnecting personal accounts as a protective measure for both staff and patrons, since doing so asks library workers to access and take responsibility for personal patron data, including pii. one suggestion might be to require devices to be restored to factory settings, which could be verified by a library staff member at time of device return. libraries might also consider including privacy best practices with these devices that outline known privacy risks and provide information about how to adjust settings to limit data sharing or delete records in personal accounts where applicable (e.g., amazon). third-party digital content platforms the integration of voice assistants in third-party digital content platforms licensed by libraries is becoming more common, pointing to the complexity of upholding patron data privacy throughout these layered and linked services. this issue speaks to the difficulties navigating overlapping privacy statements and terms of service agreements, which is not unique to voice assistants but does indicate the need for more data protections and consumer-oriented information policies. ala already does advocacy work on these issues and provides many helpful guidelines , such as the library privacy guidelines for vendors (http://www.ala.org/advocacy/privacy/guidelines/vendors). still, the data environment is very much characterized by the unequal power differential between technology companies and users. we are in dire need of more robust information policy frameworks that are predicated on transparency, strict parameters for data collection and use, corporate accountability, and user control and agency. a promising example of this is the general data protection regulation (gdpr) implemented in the european union in 2018. something similar is needed in the us to regulate corporate data-sharing practices and give users more control over their data. this would be beneficial across the board for the public, as well as to library patrons using their personal voice assistant devices to access library resources. education opportunities for expanding digital literacy library workers in our study reported a range of technology education and digital literacy programming initiatives in their libraries, though none that specifically addressed voice assistants. this suggests that library technology programming might not be targeting the kinds of specific privacy concerns posed by smart technologies like voice assistants. as smart technologies like voice assistants become more common for household/personal use, it would make sense to expand library programming initiatives to include informational sessions that incorporate data privacy considerations for smart technologies in addition to skills-driven sessions. additionally, some survey responses indicated that library workers may have some knowledge gaps or a lack of concern about voice assistant use. this might point to a need for expanded education, training, and professional development around data privacy issues and emerging technologies for library workers. there has already been a large push in the field to expand digital literacy, defined by ala as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.”29 however, this definition of digital literacy falls short of considering the role of assessing data collection, storage, and use as a core part of digital knowledge. expanding digital literacy training, for both staff and http://www.ala.org/advocacy/privacy/guidelines/vendors information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 13 patrons, to include awareness of the data ecosystems and privacy concerns that undergird smart technologies is a must for responsive library services. surveilling patrons and staff voice assistants placed in public service areas, in the library stacks, and in public gathering areas within the library raise the ethical issue of recording patrons (and staff) who either do not wish to be recorded, or do not even know they may be recorded. in the case of library staff, this poses a labor issue where staff may be asked to work in areas where devices may be listening to their interactions during the duration of their shift. for patrons, this could compromise privacy in reference transactions and in other information seeking activities, as well as capturing other personal interactions that take place in the library setting. it is critical that libraries are transparent about using voice assistant technologies, upfront about the potential privacy harms of these technologies, and abide by “opt-in” rather than “opt-out” frameworks. library workers should consider treating voice assistant records in the same way they have historically treated circulation records, opting to either delete these records or not collect them (meaning, not use voice assistants) at all. unlike circulation records, however, library workers have far less control over the data captured by voice assistants. this data is stored in the cloud on privately owned servers that remain outside of library control and oversight. given the incredibly low bar for federal access to information under the usa patriot act, actively facilitating the collection of patron and staff interactions, particularly without informed consent, should give librarians pause. opt to not adopt in light of the issues raised in this study, library workers need to seriously weigh whether the benefits of using voice assistants in libraries at this point in time outweigh the vast privacy concerns that we have outlined here. as it stands, these technologies are not currently filling a gap in library services that cannot be otherwise met by more traditional service models that carry fewer potential harms for our patron communities. importantly, not all patrons are equally vulnerable to harm or exploitation in these data environments. for instance, there is a wealth of research that demonstrates the multitude of ways that black, indigenous, people of color, lqbtq+, women, and low-income communities are subjected to higher levels of surveillance and data profiling that results in harassment, discrimination, economic penalties, and legal persecution.30 as the current national political landscape is aflame in protests against police violence and anti-black racism, it is important to identify surveillance technologies as policing technologies. libraries need to consider that these tools, as extensions of policing data networks, may directly endanger, particularly, black, latinx, and indigenous people who are already subjected to over-policing. in this sense, concerns about patron data privacy are high-stakes and are deeply linked to the professional core value of social responsibility.31 libraries should consider not using voice assistants until key data privacy concerns are addressed, more robust data protections are in place at a federal level, and the blanket authority for federal agencies and law enforcement to compel user data is revoked. this is not a technophobic stance. on the contrary, we are suggesting that library workers could serve an important role as privacy advocates, which includes critically evaluating the role of emerging technologies in their communities on behalf of public interest. a key part of this must include the library profession taking responsibility for the use of surveillance technologies in their institutions since these technologies are deeply implicated in the policing of disenfranchised communities by state and federal authorities. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 14 conclusion we view this study as a modest starting point for mapping some of the many privacy issues associated with voice assistant use in library services and programming and hope it points a way forward for future research. future research might address specific case studies of voice assistant use in libraries, data mapping of patron data through third-party library services, use and privacy issues across different institution types, patron digital literacies with voice assistants, and library policies for smart technologies more generally. plural and diverse vantage points are needed to understand the potential impacts of these technologies across different community types. such research is critical for developing best practices, guidelines, policies, and education opportunities for voice assistant use (and other smart technologies) that prioritize patron privacy and confidentiality. the use of voice assistants in libraries raises questions about the responsibility of libraries and librarians to actively engage patron data privacy concerns when considering integrating these technologies into services and programming. indeed, we encourage library workers to consider informed non-adoption of these technologies as a socially responsible professional stance until the key issues we have outlined are addressed. while it is, of course, important for library workers to remain current and innovative in their services, it is also paramount that patron privacy (as a function of safety) stays at the forefront of library services. in other words, it is the responsibility of library workers to anticipate potential privacy issues associated with emerging technologies, rather than treating privacy as a secondary concern to technology adoption. there are tremendous opportunities for library workers to lead the data privacy charge—in collaboration with community stakeholders—in pursuit of privacy-centered library services that are accountable to community members, particularly those who are mostly likely to be harmed by these technologies. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 15 appendix a: survey instrument 1. by selecting the “i agree” button below, i hereby certify: that i am 19 years old or older; that i have read and understand the above consent form; and that this action indicates my willingness to voluntarily take part in the study. a. i agree to participate in the research study described above. b. i do not agree to participate in the research study described above. 2. do you work in a library setting? a. yes b. no 3. what kind of library do you work at? a. public b. academic c. school d. other, please specify [fill in the blank] 4. what is the size of your library’s service population? a. 2,500 or less b. 2,500-9,999 c. 10,000-25,000 d. 25,000+ e. i’m not sure. 5. what state is your library located in? [fill in the blank] 6. does your library have amazon echo devices, google home devices, or apple siri devices available for use by patrons? a. yes b. no c. i’m not sure. 7. which of the following digital assistant devices does your library have available for patrons to use? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify: [fill in the blank] 8. please provide some examples of how your library patrons use the library's digital assistant technologies. [short answer] 9. could you describe where these digital assistant technologies are located in the library? [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 16 10. does your library use amazon echo devices, google home devices or apple’s siri devices in any of the following kinds of programming? (select all that apply) a. tech “petting zoos” b. trivia c. homework help d. technology classes e. makerspaces f. not listed, please specify: [fill in description] g. none of the above 11. for the programs you selected, briefly explain how the devices are integrated into programming. [short answer] 12. does your library circulate amazon echo, google home, and apple siri devices to the public for checkout? a. yes b. no c. i’m not sure. 13. which devices do you circulate? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify [fill in the blank] 14. do you provide any privacy information and/or best practice information with the device at checkout? a. yes b. no c. i’m not sure 15. if so, briefly explain what kind of privacy or best practices information you include. examples of content covered in this information would be helpful. [short answer] 16. do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library? a. yes b. no c. i’m not sure 17. could you describe these privacy concerns? [short answer] 18. does your library offer any sort of technology courses to the public? a. yes b. no c. i’m not sure information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 17 19. does your library teach data privacy or data literacy as part of the library's programming? a. yes b. no c. i’m not sure 20. does your library offer any data literacy education in programming that specifically addresses digital assistants? a. yes b. no c. i’m not sure 21. what kinds of data literacy information is provided in these courses taught at your library? please provide some examples: [short answer] 22. does your library use any of the following services? select all that apply: a. overdrive/libby b. hoopla c. none of the above 23. are you aware that both overdrive and hoopla have amazon echo application integration (called "skills")? a. yes b. no c. i’m not sure 24. does your library inform patrons about amazon echo skills on overdrive and/or hoopla? a. yes b. no c. i’m not sure 25. are you aware that amazon's privacy policies differ from those of overdrive and hoopla? a. yes b. no c. i’m not sure 26. does your library provide any information to patrons about overdrive and hoopla's privacy policies? a. yes b. no c. i’m not sure 27. does your library provide any information to patrons about amazon's privacy policies? a. yes b. no c. i’m not sure 28. please provide a brief description of the information that you are providing to patrons on this subject, including where this information is located for patron access. [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 18 29. are you aware of the guidelines that the american library association (ala) provides on privacy as it pertains to third party electronic vendors? a. yes b. no 30. does your library use or refer to these privacy guidelines in any informational materials for patrons? a. yes b. no c. i’m not sure 31. please describe these informational materials, including how and where they are distributed to patrons: [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 19 endnotes 1 benjamin herald, “teacher’s aide or surveillance nightmare? alexa hits the classroom,” digital education, education week, june 26, 2018, http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teacher s_surveillance.html?cmp=soc-shr-fb. 2 carrie smith, “your library needs to speak to you,” american libraries (june 3, 2019), https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-tospeak-to-you/. 3 nicole hennig, siri, alexa, and other digital assistants: the librarian’s quick guide (santa barbara, ca: libraries unlimited, 2018) 33–8. 4 adapted from: brenda laurel, “interface agents: metaphors with character,” human values and the design of computer technology (1997): 207–19, cambridge university press. 5 emily clark, “alexa, are you listening? how people use voice assistants,” https://clutch.co/appdevelopers/resources/alexa-listening-how-people-use-voice-assistants. 6 clark, “alexa, are you listening? how people use voice assistants.” 7 clark, “alexa, are you listening? how people use voice assistants.” 8 "voice control", american library association, http://www.ala.org/tools/future/trends/voicecontrol. 9 shannon liao, “google home will play music and sound effects when you read disney storybooks,” https://www.theverge.com/2018/10/29/18037466/google-home-disneymusic-moana-incredibles-coco-storytime; hennig, siri, alexa, and other digital assistants, 35; susan allen and avneet sarang, “serving patrons using voice assistants at worthington,” online searcher 42, no. 6 (november-december 2018): 49–52. 10 smith, “your library needs to speak to you.” 11 smith, “your library needs to speak to you.” 12 allen and sarang, “serving patrons using voice assistants at worthington.” 13 king county library system, “voice assistants, connecting you to your library,” https://kcls.org/voice/. 14 “core values of librarianship,” american library association, http://www.ala.org/advocacy/intfreedom/corevalues. 15 miriam e. sweeney, “digital assistants,” in uncertain archives: critical keywords for big data, ed. nanna bonde thylstrup, daniela agostinho, annie ring, catherine d’ignazio , and kristin veel (baltimore, md: mit press, 2021), 151–60. http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants http://www.ala.org/tools/future/trends/voicecontrol https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://kcls.org/voice/ http://www.ala.org/advocacy/intfreedom/corevalues information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 20 16 anthony cuthbertson, “amazon admits employees listen to audio from echo devices,” https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echolistening-spy-security-a8865056.html. 17 matt day et al., “amazon workers are listening to what you tell alexa,” https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-onalexa-a-global-team-reviews-audio. 18 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations,” mon(iot)r, february 14, 2020, https://moniotrlab.ccis.neu.edu/smart-speakers-study/. 19 karen hao, “amazon is the invisible backbone of ice’s immigration crackdown,” mit technology review, october 16, 2019, https://www.technologyreview.com/s/612335/amazon-is-theinvisible-backbone-behind-ices-immigration-crackdown/. 20 zack whittaker, “echo is listening, but amazon’s not talking,” zdnet, january 16, 2018, https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/. 21 american library association, “library privacy guidelines for vendors,” http://www.ala.org/advocacy/privacy/guidelines/vendors. 22 this research protocol (19-08-2671) was approved in october 2019 by the university of alabama’s institutional review board (irb). 23 note, participants were not required to answer every question, so some questions have fewer than 86 total responses due to participants electing to not respond. also, even though we were targeting public and academic libraries, we did receive a response from someone identifying their institution as a school library and decided to include it in the results. 24 we did not receive responses from libraries in arizona, arkansas, connecticut, delaware, pennsylvania, vermont, virginia, or wyoming. 25 “technology petting zoos” are areas where patrons can experiment or try out technologies and gadgets. 26 hoopla, “alexa, meet hoopla,” july16, 2018, http://hub.hoopladigital.com/whatsnew/2018/7/alexa-meet-hoopla. 27 we purposely couched questions about “data literacy” and “data privacy” broadly in the survey to allow for a range of interpretations by respondents in an attempt to capture the range of information that might be taught under this umbrella. 28 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations.” 29 american library association, “digital literacy,” https://literacy.ala.org/digital-literacy/. 30 examples of critical research in this area include: toby beauchamp, going stealth: transgender politics and u.s. surveillance practices (durham, london: duke university press, 2019); https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/ http://www.ala.org/advocacy/privacy/guidelines/vendors http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla https://literacy.ala.org/digital-literacy/ information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 21 virginia eubanks, automating inequality: how high-tech tools profile, police, and punish the poor (st. martin’s press, 2018); safiya u. noble, algorithms of oppression: how search engines reinforce racism (new york: nyu press, 2018). 31 “core values of librarianship,” american library association. abstract introduction review of literature what is a voice assistant? libraries and voice assistant use data privacy issues research methods findings participant demographics how are libraries using smart voice assistant technologies as a part of their library services? how aware are library workers of how voice assistants integrate with third-party digital content platforms? are libraries educating library patrons about voice assistant technologies as a part of services and programming? what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? data access and use surveillance and “always on” features procedure and operations legal issues professional responsibility discussion: key considerations for library professionals circulation procedures third-party digital content platforms education opportunities for expanding digital literacy surveilling patrons and staff opt to not adopt conclusion appendix a: survey instrument endnotes letter from the editors (june 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16xxx welcome to the june 2023 issue. see below for updates on our hosting migration plans, editorial board membership, and our call for submissions. peer-reviewed articles in the current issue are listed here: • supporting faculty’s instructional video creation needs for remote teaching: a case study on implementing eglass technology in a library multimedia studio space / hanwen dong • technology integration in storytime programs: provider perspectives / maria cahill, erin ingram and soohyung joo • a tale of two tools: comparing libkey discovery to quicklinks in primo ve / jill k. locascio and dejah rubel mary carrier also shares her experience in creating a coding club for kids and teens in this month’s “public libraries leading the way” column. she highlights helpful resources and tools in her article “community-driven programming: offering coding and robotics classes in your library.” ital will move to a new host this summer as mentioned in the march letter from the editor, ital is moving from our longtime hosting at boston college to ala production services. a reminder of a few important details: • our landing page url, https://italjournal.org/, will get you to the journal’s front page both now and after the migration is complete. • urls for articles will change, but dois will continue to resolve the new home of the journal. we will work with our current host, boston college, to set up redirects for as many informational pages as possible to the new location. • for authors and reviewers, there should not be any significant differences because we will continue to use the open journal system platform. • articles published in ital (and our two sibling journals, library leadership and management and library resources and technical services) will continue to be open access with no fees charged to authors or readers. authors maintain copyright in their work. if you would like to receive an email when the september 2023 issue is published at our new location, create a user account by going to the user registration page. be sure to check the “yes, i would like to be notified of new publications and announcements” box near the bottom of the sign-up page to receive an email when future issues are published. we are very grateful to boston college for their support of information technology and libraries over the past decade and to the core board for supporting this migration project. editorial board membership update june marks the end of terms for six of our editorial board colleagues. marisha and i extend our gratitude to lori ayre, jon goddard, soo-yeon hwang, holi kubly, brady lund, and paul swanson for their commitment, dedication, and excellent service to ital and the library technology profession over the four years they have volunteered on the editorial board. https://ejournals.bc.edu/index.php/ital/article/view/15201 https://ejournals.bc.edu/index.php/ital/article/view/15201 https://ejournals.bc.edu/index.php/ital/article/view/15701 https://ejournals.bc.edu/index.php/ital/article/view/16253 https://ejournals.bc.edu/index.php/ital/article/view/16617 https://ejournals.bc.edu/index.php/ital/article/view/16617 https://doi.org/10.6017/ital.v42i1.16319 https://italjournal.org/ https://ejournals.bc.edu/index.php/ital/user/register information technology and libraries march 2023 letter from the editors 2 varnum and kelly in july, we will welcome new members to the editorial board. new members joining us for a twoyear (renewable) term are: cindi blyberg, joanna dipasquale, john klima, ellen schmid, and le yang. we are excited to have them join us. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com ital will move to a new host this summer editorial board membership update be a part of a future issue what more can we do to address broadband inequity and digital poverty? editorial board thoughts what more can we do to address broadband inequity and digital poverty? lori ayre information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12619 lori ayre (lori.ayre@galecia.com) is principal, the galecia group, and a member of the ital editorial board. © 2020. we are now almost seven months into our new lives with the novel coronavirus and over 190,000 americans have died of covid-19. library administrators have been struggling with their commitment to provide services to their communities while keeping their staff safe. initially, libraries relied on their online offerings, so more e-books and other online resources were acquired. staff learned that they could do quite a bit of their work from home. they could still respond to email and phone messages. they could evaluate and order new material. they could deliver online programs like summer reading and story time. they could interact with people on social media. they could put together key resources for patrons and post them on the website.1 a lot of what the library was doing while the buildings were closed was not obvious. most people associate the library with the building and since the building was closed… it seemed like nothing was happening at the library. yet, library workers were busy. once it became possible for library staff to enter the building (per local health ordinances), the first thing that libraries started to do was accept returns. that was a little fraught considering how little we knew about the virus and how long contaminants might live on returned library material. eventually with the long-awaited testing results from the realm project and battelle labs (https://www.webjunction.org/explore-topics/covid-19-research-project.html), people started standardizing on a three-day quarantine of returns. then more testing of stacked material was done resulting in some people choosing to quarantine returns for four days. as of early september, we have learned that even five days isn’t enough to quarantine delivery totes and some other plastic material. curbside pick-up was born in these early days of being allowed back in the buildings. if someone had mapped who was offering curbside pick-up, it would look like popcorn popping across the country. the number of libraries offering the service slowly increased and pretty soon nearly everyone was doing it.2 many library directors will say that curbside pick-up is here to stay. people love the convenience too much to take the service away. rolling out curbside pick-up has had some challenges: how to safely make the handoff between library staff and library patrons; whether to accept returns; whether to charge fines; modifying mailto:lori.ayre@galecia.com https://www.webjunction.org/explore-topics/covid-19-research-project.html information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 2 circulation policies to fit the current needs; and selecting books for people that want them but who don’t have the skills needed to negotiate the library catalog’s requesting system. some libraries started putting together grab bags of materials selected by staff for specific patrons—kind of like homebound services on-the-fly. curbside helped get material in circulation again. importantly, also during this period, libraries started finding creative ways to get wi-fi hotspots out into communities. they began lending them if they weren’t already. those libraries already circulating hotspots increased their inventory. they took their bookmobiles into neighborhoods and created temporary hotspot connections around town. many libraries made sure wi-fi from their building was available in their own parking lots.3 but one thing everyone has learned during this pandemic is that libraries alone cannot be the solution to the digital divide. this isn’t news to librarians who have been arguing that internet access should be as readily available as electricity and water. librarians understand that information cannot be free and accessible unless everyone has internet access and knows how to use it. public access computers, wi-fi hotspots, and media literacy are all staple services in our libraries today.4 however, these services are not enough to bridge a digital divide that only seems to be getting worse. the coronavirus that closed libraries and schools has made it painfully clear that something much bigger has to happen to address the problem. as gina millsap stated in a recent facebook post: i think it’s become obvious that the covid-19 crisis is shining a spotlight on the flaws we have in our broadband infrastructure and on our failure to make the investments that should have been made for equitable access to what should be a basic utility, like water or electricity.5 according to broadbandnow, the number of people who lack broadband internet access could be as high as 42 million.6 the fcc reports that at least “18 million people lacked access to broadband internet at the end of 2018.”7 even if all the libraries were open and circulating hundreds of wi-fi hotspots, we’d still have a very serious access problem. thinking differently about addressing the digital divide in a paper published march 28, 2019, by the urban libraries council (ulc), the author suggested three specific actions that libraries can take to address race and social equity and the digital divide. they are: 1. assess and respond to the needs of your community through meaningful conversation (including considering different partners for your work) 2. optimize funding opportunities to support your efforts (e.g. e-rate), and information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 3 3. think outside the box to create effective solutions that are informed by those in need (e.g. lending wi-fi hot spots).8 while we know libraries have been heeding this advice when it comes to wi-fi hotspots, let’s look into what can be done when we consider ulc’s suggestion to consider different partners for your work. community partners an excellent example of what can be done with a coalition of community partners comes from detroit where a mesh wireless network was put in place to provide permanent broadband in a low-income neighborhood.9 the project is called the detroit community technology project. with the community-based mesh network, only one internet account is needed to provide access for multiple homes. the networks also enable people on the network to share resources on the network (calendar, files, bulletin board) and that data lives on their network, not in the cloud. one of the sponsors of the detroit community technology project is the allied media project (https://www.alliedmedia.org/) which also sponsors the casscowifi and the equitable internet initiative to get broadband and digital literacy training into several underserved areas. community networks (https://muninetworks.org/), a project of the institute for local selfreliance (https://ilsr.org/), describes several innovative projects in which communities partner with electric utilities. surry county, virginia, expects to extend broadband access to 6,700 households through a first-ever partnership between a utility (dominion energy virginia) and an electric cooperative (dominion energy). a similar project is underway with the northern neck cooperative and dominion energy.10 these initiatives are made possible due to some regulatory changes made in virginia (sb 966). according to community networks, there are 900 communities providing broadband connectivity locally (https://muninetworks.org/communitymap). but nineteen states still have barriers in place that discourage, if not outright prevent, local communities from investing in broadband. libraries in states where community networks are a viable option should be at the table, or perhaps setting the table, for discussions about how to bring broadband to the entire community not just into the library or dispatched one-at-a-time via wi-fi hotspots. this is an opportunity to convene community conversations focusing on the issue of broadband. library staff have been doing more and more of this type of outreach into the community and acting as facilitator. the ala has even produced a community conversation workbook (http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf ) to support libraries just getting started. state partners in california, the governor recently issued executive order n-73-20 (https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf) directing state agencies to pursue a goal of 100 mbps download speed and outlines actions across state agencies https://www.alliedmedia.org/ https://muninetworks.org/ https://ilsr.org/ https://muninetworks.org/communitymap http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 4 and departments to accelerate mapping and data collection, funding, deployment, and adoption of high-speed internet.11 this will undoubtedly create fertile ground for libraries to partner with other agencies and community organizations to advance this initiative. libraries are specifically called out to raise awareness of low-cost broadband options to their local community. every state has some kind of broadband task force or commission or advisory council (https://www.ncsl.org/research/telecommunications-and-information-technology/statebroadband-task-forces-commissions.aspx). this is another instance where libraries should be at the table. in my state, our state librarian is on the california broadband council. but many of these commissions do not have a representative from the library world which means they probably are not hearing from us. whether it is through your local library, your state library, or your state library association, it is important for librarians to build relationships with people on these commissions—if not get a seat on the commission themselves. national partners unless your community is blanketed with affordable broadband connectivity, it will be important that we continue to advocate nationally for the needs we see. in addition to helping the patron standing right in front of us checking out their hotspot, we also need to address the needs of the people who aren’t able to get to the library but are equally in need of access. our job is to make sure that any new initiatives undertaken by a new administration provide for free and equitable access to the internet for every household. extending e-rate (the federal communication commission’s program for making internet access more affordable for schools and libraries) isn’t enough. free (or at least affordable) broadband needs to be brought to every home. the electronic frontier foundation (eff) argues that fiber-to-the-home is the best option for consumers today because it will be easily upgradeable without touching the underlying cables and will support the next generation of applications (see https://www.eff.org/wp/case-fiber-hometoday-why-fiber-superior-medium-21st-century-broadband). libraries have worked with the eff on issues related to privacy and government transparency. maybe it’s time to team-up with them about broadband. global partners low earth orbit (leo) satellites could potentially bring broadband to everyone on earth.12 starlink (https://www.starlink.com/) is elon musk’s initiative and project kuiper (https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-projectkuiper-satellite-constellation) is amazon’s jeff bezos’ project. a private beta starlink service is due (or perhaps it is already happening). if it works as musk has envisioned, it could be a gamechanger. or it might just make the digital divide worse if it isn’t affordable to everyone who needs it. how might we lobby musk to roll-out this service in a way that is equitable and fair? https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.starlink.com/ https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 5 speak up, speak out, and get in the way these are just a few avenues that we, as professionals committed to free access to information, might pursue. i worry that we have not made enough noise about the problems we see in our communities that are a result of broadband inequity and digital poverty. and although virtually every library is doing something to address the problem, our efforts are no match for the magnitude of the problem. in a blog post on the brookings institution’s website, authors lara fishbane and adie tomer argue for a new agenda focused on comprehensive digital equity that includes (among other things) “building networks of local champions, ensuring community advocates, government officials, and private network providers share intelligence, debate priorities, and deploy new programming .”13 there are no better local champions and advocates for communities than the city or county librarians and their staffs. let’s treat this problem with the seriousness it deserves and at a scale that will be meaningful. to quote john lewis (as so many of us have since his death on july 17, 2020), it's time for us to “speak up, speak out, and get in the way.”14 we have to make it painfully clear to policymakers that libraries cannot bridge the digital divide with public access computers and hotspots. we need to tell our communities’ stories, convene conversations, and agitate for equitable broadband that is as readily available as water and electricity. endnotes 1 “libraries respond: covid-19 survey,” american library association, accessed august 25, 2020, http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summaryof-results-web-2.pdf. 2 erica freudenberger, “reopening libraries: public libraries keep their options open,” library journal, june 25, 2020, https://www.libraryjournal.com/?detailstory=reopening-librariespublic-libraries-keep-their-options-open. 3 lauren kirchner, “millions of american depend on libraries for internet. now they’re closed,” the markup, june 25, 2020, https://themarkup.org/coronavirus/2020/06/25/millions-ofamericans-depend-on-libraries-for-internet-now-theyre-closed. 4 jim lynch, “the gates library foundation remembered: how digital inclusion came to libraries,” techsoup, accessed august 24, 2020, https://blog.techsoup.org/posts/gateslibrary-foundation-remembered-how-digital-inclusion-came-to-libraries. 5 gina millsap, “this was in april. q. we’re starting a new school year and what has changed? a. not much. it’s past time to get serious about universal broadband in the u.s.” facebook, august 16, 2020, 5:37 a.m., https://www.facebook.com/gina.millsap.7/posts/10218986781485855. accessed september 14, 2020. http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://www.facebook.com/gina.millsap.7/posts/10218986781485855 information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 6 6 “libraries are filling the homework gap as students head back to school,” broadband usa, last modified september 4, 2018, https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-arefilling-homework-gap-students-head-back-school. 7 james k. willcox, “libraries and schools are bridging the digital divide during the coronavirus pandemic,” consumer reports, last modified april 29, 2020, https://www.consumerreports.org/technology-telecommunications/libraries-and-schoolsridging-the-digital-divide-during-the-coronavirus-pandemic/. 8 sarah chase webber, “the library’s role in bridging the digital divide”, urban libraries council, last modified march 28, 2019, https://www.urbanlibraries.org/blog/the-librarys-role-inbridging-the-digital-divide. 9 cecilia kang, “parking lots have become a digital lifeline,” the new york times, may 20, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html. 10 ry marcattilio-mccracken, “electric cooperatives partners with dominion energy to bring broadband to rural virginia,” last modified august 6, 2020, https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bringbroadband-rural-virginia. 11 “newsom issues executive order on digital divide,” cheac (improving the health of all californians), last modified august 14, 2020, https://cheac.org/2020/08/14/newsom-issuesexecutive-order-on-digital-divide/. 12 tyler cooper, “bezos and musk’s satellite internet could save americans $30b a year,” podium: opinion, advice, and analysis by the tnw community, last modified august 24, 2019, https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-couldsave-americans-30b-a-year/. 13 lara fishbane and adie tomer, “neighborhood broadband data makes it clear: we need an agenda to fight digital poverty,” brookings institution, last modified february 6, 2020, https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-datamakes-it-clear-we-need-an-agenda-to-fight-digital-poverty/. 14 rashawn ray, “five things john lewis taught us about getting in ‘good trouble,’” brookings institution, last modified july 23, 2020, https://www.brookings.edu/blog/how-werise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/. https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ thinking differently about addressing the digital divide community partners state partners national partners global partners speak up, speak out, and get in the way endnotes efficiently processing and storing library linked data using apache spark and parquet kumar sharma, ujjal marjit, and utpal biswas information technology and libraries | september 2018 29 kumar sharma (kumar.asom@gmail.com) is research scholar, department of computer science and engineering; ujjal marjit (marjitujjal@gmail.com) is system-in-charge, center for information resource management (cirm); and utpal biswas (utpal01in@yahoo.com) is professor, department of computer science and engineering, the university of kalyani, india. abstract resource description framework (rdf) is a commonly used data model in the semantic web environment. libraries and various other communities have been using the rdf data model to store valuable data after it is extracted from traditional storage systems. however, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional datamanagement tools. this challenge demands a scalable and distributed system that can manage data in parallel. in this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. apache spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing rdf data. the storage system is built on top of hadoop distributed file systems (hdfs) and uses the apache parquet format to store data in a compressed form. the experimental evaluation showed that storage requirements were reduced significantly as compared to jena tdb, sesame, rdf/xml, and n-triples file formats. sparql queries are processed using spark sql to query the compressed data. the experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases. introduction more and more organizations, communities, and research-development centers are using semantic web technologies to represent data using rdf. libraries have been trying to replace the cataloging system using a linked-data technique such as bibframe.1 libraries have received much attention on transitioning marc cataloging data into rdf format.2 data stored in various other formats such as relational databases, csv, and html have already begun their journey toward the open-data movement.3 libraries have participated in the evolution of linked open data (lod) to make data an essential part of the web.4 various researchers have explored areas related to library data and linked data. in particular, transitioning legacy library data into linked data has dominated most of the research works. other areas include researching the impact of linked library data, investigating how privacy and security can be maintained, and exploring the potential effects of having open linked library data. obviously, a linked-data approach for publishing data on the web brings many benefits to libraries. first, once isolated library data currently stored using traditional cataloging systems (marc) becomes a part of the web, it can be shared, reused, and consumed by web users.5 this promotes the cross-domain sharing of knowledge hidden in the library data, opening the library as a rich source of information. online library users can share more information using linked library resources since every library mailto:kumar.asom@gmail.com mailto:marjitujjal@gmail.com mailto:utpal01in@yahoo.com efficiently processing and storing library linked data | sharma, marjit, and biswas 30 https://doi.org/10.6017/ital.v37i3.10177 resource is crawlable on the web via uniform resource identifiers (uri). most importantly, library data benefits from linked-data technology’s real advantages, such as interoperability, integration with other systems, data crosswalks, and smart federated search.6 numerous approaches have evolved for making the vision of the semantic web a success. no doubt, they have succeeded in making the library a part of the web, but there remain issues related to library big data. the term big data refers to data or information that cannot be processed using traditional software systems.7 the volume of such data is so large that it requires advanced technologies for processing and storing the information. libraries also have real concerns with large volumes of data during and after the transition to linked data. the main challenges are in processing and storage. during conversion from library data to rdf, the process can become stalled because of the large volumes of data. once the data is successfully converted into rdf formats, there are storage issues. finally, even if the data is somehow stored using common rdf triple stores, it is difficult to retrieve and filter. this is a challenging problem that every librarian must give attention to. librarians should know the real nature of library big data, which causes problems in analyzing data and decision making. librarians must also know the technologies that can resolve these issues. the rate of data generation and the complexity of the data itself are constantly increasing. traditional data-management tools are becoming incapable of managing the data. that is why the definition of big data has been characterized by five vs—volume, velocity, variety, value, and veracity.8 • volume is the amount of the data. • velocity is the data-generation rate (which is high in this case). • variety refers to the heterogeneous nature of the data. • value refers to the actual use of the data after the extraction. • veracity is the quality or trustworthiness of the data. to handle the five vs of big data, distributed technologies such as commodity hardware, parallel processing frameworks, and optimized storage systems are needed. commodity hardware reduces the cost of setting up a distributed environment and can be managed with very limited configurations. a parallel processing system can process distributed data in parallel to reduce processing time. an optimized storage system is required to store the large volume of data, supporting scalability to accommodate more data on demand. with these library requirements to tackle the challenges posed by library big data, a distributed solution is proposed. this approach is based on apache hadoop, apache spark, and a column-oriented storage system to process largesize data and to store the processed data in a compressed form. bibliographic rdf data from british national library and the national library of portugal have been used for this experiment. these bibliographic data are processed using apache spark and stored using apache parquet format. the stored data can be queried using sparql queries for which spark sql is used to execute queries. given an existing rdf dataset, we designed a schema for storing rdf data using a columnoriented database. using column-oriented design with apache parquet and spark sql as the query information technology and libraries | september 2018 31 processor, a distributed rdf storage system was implemented that can store any amount of rdf data by increasing the number of distributed nodes as needed. literature review while big data continues to rise, library data are still in traditional storage systems isolated from the web. to continue working with the web, libraries must redesign the way they format data and contribute toward the web of data. to serve library data to other communities, libraries must integrate their data with the web. attempts to do this have been made by several researchers. the task of integration cannot be achieved by only librarians; rather, it requires a team of experts in the field of library and information technology. the advanced way for integrating resources is with linked-data technology by assigning uris to every piece of library data. with this goal, there exist various projects related to the convergence of library data and linked data. one of these, bibframe, is an initiative to transition bibliographic resources into linked-data representation. bibframe aims to replace traditional cataloging standards such as marc and unimarc using the concept of publishing structured data on the web. marc formats cannot be exchanged easily with nonlibrary systems. the marc standard also suffers from inconsistencies, errors, and inability to express relationships between records and fields within the record. that is why mostly bibliographic resources stored in marc standards are targeted for conversion.9 other works include the open-data initiative from the british national library, library catalog to linked opendata conversion, exposing library data as linked data, and building a knowledge graph to reshape the library staff directory.10 linked data is fully dependent on rdf. rdf reveals graph-like structures where resources are linked with one another. thus, rdf can improve on marc standards because of its strong ability to link related resources. this system of revealing everything as a graph helps in building a network of library resources and other data on the web. this also makes for fast search functionality. in addition, searching a topic or book could bring similar graphs from other library resources, leading to the creation of linked-data service.11 such a service has been implemented by the german national library to provide bibliographic and authority data in rdf format, by the europeana linked open data with access to open metadata on millions of books and multimedia data, and by the library of congress linked data service.12 there is less discussion of library big data. though big data in general is in active research, the library domain has received much less attention than the broader concept of big data and its challenges. this could be because most of librarians working with linked data are from nontechnical backgrounds. now is the right time for libraries to give priority to adopting big data technologies to overcome challenges posed by big data. wang et al. have discussed library big data issues and challenges.13 they made some statements about whether library data belongs to the big data category. obviously, library data belongs to big data since it fulfills some of the characteristics of big data, such as volume, variety, and velocity. wang et al. also raise some of libraries’ challenges related to library big data, such as lacking teams of experts, inability to adopt big data due to budgetary issues, and technical challenges. finally, they point out that to take advantage of the web’s full potential, library data must be transformed into a format that can be accessible beyond the library using technologies like semantic web and linked data. the web has already started its work related to big-data challenges. libraries need to transition their data into an advanced format with the ability to handle big-data issues. the main problems efficiently processing and storing library linked data | sharma, marjit, and biswas 32 https://doi.org/10.6017/ital.v37i3.10177 related to library big data happen at data transformation and storage. to store and retrieve large amounts of data, we need commodity hardware that can handle trillions of rdf triples, requiring terabytes or petabytes of disk space. as of now, there are semantic web frameworks such as jena and sesame to handle rdf data, but these frameworks are not scalable for large rdf graphs.14 jena is a java-based framework for building semantic web and linked-data applications. it is basically a semantic web programming framework that provides java libraries for dealing with rdf data. jena tdb is the component of jena for storing and querying rdf data. 15 it is designed to work in a single-node environment. sesame is also a semantic web framework for processing, storing, and querying rdf data. basically, sesame is a web-based architecture for storing and querying rdf data as well as schema information. 16 background this section briefly describes the structure of rdf triples, apache spark along with its features and column-oriented database system, and apache parquet. structure of rdf triples rdf is a schema-less data model. it implies that the data is not fixed to a specific schema, so it does not need to conform to any predefined schema. unlike in relational tables, where we define columns during schema definition and those columns must contain the required type of data, in rdf we can have any number of properties and data using any kind of vocabulary. we only need vocabulary terms to embed properties. the vocabulary is created using domain ontology, which represents the schemas. to describe library resources we need a library-domain ontology. for example, to define a book and its properties one can use the bookont ontology.17 bookont is a book-structure ontology designed for an optimized book search and retrieval process. however, it is not mandatory to use existing ontology and all the properties defined under it. we can use terms from a newly created ontology or mixed ontologies with required properties. rdf represents resources in the form of subject, predicate, and object. the subject is the resource being described, identified by a uri. this subject can have any number of property-value pairs. this way representation of a resource is called knowledge representation, where everything is defined as a knowledge in the form of entity attribute value (eav). in rdf, the basic unit of information is a triple t, such that t = {subject, predicate, object}. such information when stored on disk is called a triplestore. the collection of rdf triples is called an rdf database. an rdf database is specially designed to store linked data to make the web more useful by interlinking data from different sources in a meaningful way. the real advantage of rdf is its support of the common data model. rdf is the standard way for publishing meaningful data on the web, and this is backed by linked data. linked data provides some rules about how data can be published on the web by following the rdf data model.18 with such a common data model, one can integrate data from any sources by inserting new property-value pairs without altering database schema. another important purpose of rdf is to provide resources to be processable by software agents on the web. rdf triples are of two types: literal triples and linked triples. literal triples consist of a uri referenced subject and a literal object (scalar value) joined by a predicate. in linked triples, both the subject and the object consist of uris linking by the predicate. this type of linking is called rdf link, which is the basis for interlinking the resources.19 rdf data are queried using the sparql query language.20 sparql is a graph-matching query language and is used to retrieve information technology and libraries | september 2018 33 triples from the triple store. the sparql queries are also called semantic queries. like sql queries, sparql also finds and retrieves the information stored in the triplestore. a sparql query is composed of five main components:21 • the prefix declaration part is used to abbreviate the uris; • the dataset definition is used to specify the rdf dataset from which the data is to be fetched; • the result clause is used to specify what information is needed to be fetched, which can be select, construct, describe, and ask; • the query pattern is used to specify the search conditions; and • the query modifiers are used to rearrange query results using order by, limit etc. hadoop and mapreduce hadoop is open-source software that supports distributed processing of large datasets on machine clusters.22 two core components—hadoop distributed file system (hdfs) and mapreduce— make distributed storage and computation of processing jobs possible.23 hdfs is the storage component, whereas mapreduce is a distributed data-processing framework, the computational model of hadoop based on java. the mapreduce algorithm consists of two main tasks: map and reduce. the map task takes a set of data as input and produces another set of data with individual components in the form of key/value pairs or tuples. the output of the map task goes to the reduce task, which combines common key/value pairs into a smaller set of tuples. hdfs and mapreduce are based on driver/worker architecture consisting of driver and worker nodes having different roles. an hdfs driver node is called the name-node while the worker node is called the data-node. the name-node is responsible for managing names and data blocks. data blocks are present in the data-nodes. data-nodes are distributed across each machine, responsible for actual data storage. similarly, the mapreduce driver node is called the job-tracker and the worker node is called the task-tracker. job-tracker is responsible for scheduling jobs on task-trackers. task-tracker again is distributed across each machine along with the data-nodes, responsible for processing map and reducing tasks as instructed by the job-tracker. the concept of hadoop implies that the set of data to be processed is broken into smaller forms that can be processed individually and independently. this way, tasks can be assigned to multiple processors to process the data, and eventually it becomes easy to scale data processing over multiple computing nodes. once a mapreduce program is written, the program can be scaled to run over thousands of machines in a cluster. spark and resilient distributed datasets (rdd) apache spark is an in-memory cluster computing platform, which is a faster batch-processing framework than mapreduce. more importantly, it supports in-memory processing of tasks along with data, so querying data is much faster than disk-based engines. the core of spark is the resilient distributed dataset (rdd). rdd is a fundamental data structure of spark that holds a distributed collection of data where data cannot be modified. rather, data modification yields another immutable collection of data (or rdd). this process is called rdd transformation. for example, figure 1 depicts an example of rdd transformation. the distributed processing and efficiently processing and storing library linked data | sharma, marjit, and biswas 34 https://doi.org/10.6017/ital.v37i3.10177 transformation of data is managed by rdd. rdds are fault-tolerant, meaning that the lost data is recoverable using lineage graph of rdds.24 spark constructs a direct acyclic graph (dag) of a sequence of computations that needed to be performed on data. spark has the most powerful computing engine that allows most of the computations in multistage memory. because of this multistage in-memory computation engine, it provides better performance at reading and writing data than the mapreduce paradigm.25 it aims at speed, ease of use, extensibility, and interactive analytics. spark relies on concepts such as rdd, dag, spark context, transformations, and actions. spark context is an execution environment in which rdds and broadcasting variables can be created. spark context is also called the master of a spark application and allows accessing the cluster through a resource manager. data transformation happens in the spark application when the data is loaded from a data-store into rdds and some filter or map functions are performed to produce a new set of rdds. when the set of computations is created, forming a dag, it does not perform any execution; rather, it prepares for execution in the end, like a lazy loading process. some examples of actions are data extraction or collection and getting the count of words. transformations are the sequence of events, and action is the final execution of the underlying logic. figure 1. rdd transformations. the execution model of spark is shown in figure 2. the execution model is based on the driver/worker architecture consisting of the driver and the worker processes. the driver process creates the spark context and schedules tasks based on the available worker nodes. initially, the master process must be started, then creating worker nodes follows. the driver takes the responsibility of converting a user’s application into several tasks. these tasks are distributed among the workers. the executors are the main components of every spark application. executors actually perform data processing, reading and writing data to the external sources and the storage system. the spark manager is responsible for resource allocation and deallocation to the spark job. basically, spark is only a computation model. it is not related to storage of data, which is a different concept. it only helps in computations and data analytics in a distributed manner. for distributed execution, the task is distributed among the connected nodes so that every node can perform tasks at the same time; it performs the desired operation and notifies the master upon completion of the task. information technology and libraries | september 2018 35 figure 2 execution model of spark. in mapreduce, read/write operations happen between disk and memory, making job computation slower than spark. rdds resolve this by allowing fault-tolerant, distributed, in-memory computations. in rdd, the first load of data is read from disk and then a write-to-disk operation may take place depending upon the program. the operations between first read and last write happen in memory. data on rdds are lazily evaluated, i.e., during rdd transformations, data will not take part until any action is called on the final rdd, which triggers the job execution. the chain of rdd transformations creates dependencies between rdds. each dependency has a function for calculating its data and a pointer to its parent rdd. spark divides rdd dependencies into stages and tasks, then it sends them to workers for execution. hence, an rdd does not actually hold the data; rather, it either loads data from disk or from another rdd and performs some actions on the data for producing results. one of the important features of rdd is its fault tolerance, because of which it can retain and recompute any of the unsuccessful partitions due to node failures. rdds have built-in methods for saving data into files. for example, the rdd calls on saveastextfile(), its data are written on the specified text file line by line. there are numerous options for storing data in different formats, such as json, csv, sequence files, and object files. all these file formats can be saved directly into hdfs or normal file systems. spark sql and dataframe spark sql is a query interface for processing structured data using sql style on the distributed collection of data. that means it is used for querying structured data stored in hdfs (like hive) and parquet. spark sql runs on top of spark as a library and provides higher optimization. the efficiently processing and storing library linked data | sharma, marjit, and biswas 36 https://doi.org/10.6017/ital.v37i3.10177 spark dataframe is an api (application programming interface) that can perform relational operations on rdds and external data sources such as hive and parquet. like rdds, a spark dataframe is also a collection of structured records that can be manipulated by spark sql. it evaluates operations lazily to perform relational optimizations.26 a dataframe is created using rdds along with the schema information. for example, the java code snippet below creates a dataframe using rdd and a schema called rdftriple (rdf-triple schema will be discussed in the proposed approach). javardd n_triples_ = marc_records.map(new texttostring()); javardd rdf_triples = n_triples.map(new linestordffunction()); dataset dataframe = sparksession.createdataframe(rdf_triples, rdftriple.class); dataframe.write().parquet("/full-path/rdfdata.parquet"); the spark dataframe uses memory management wisely by saving data in off-heap memory and provides an optimized execution plan. conceptually, a dataframe is equivalent to the relational tables with richer optimization and supports sql queries over its data. so, a dataframe is used for storing data into tables. structured data from spark dataframe can be saved into the parquet file format as shown in the above code snippet. column-oriented database a database is a persistent collection of records. these records are accessed via queries. the system that stores data and processes queries to retrieve data is called a database system. such systems use indexes or iteration over the records to find the required information stored in the database. indexes are an auxiliary, dictionary-like data structure that keeps indexes of individual records. indexing is efficient in some cases, however, as it requires two lookup operations and it slows down the access time. data scanning or iteration over each record resolves the query by finding the exact location of the records. it is inefficient when the size of the data is too large. as data-generation rate is increasing constantly, more and more data is going to be stored on the disk. for a fast-growing rate of data, we need a system that can adjust to more data than traditional storage systems and, at the same time, query-processing tasks should take less time. when the data gets too large, indexing and record scanning will be costly during querying. hence, a satisfying solution is the columnar-storage system, which stores data by columns rather than by rows. 27 a column-oriented database system stores data in corresponding columns, and each column is stored in a separate file into the disk. this makes data access time much quicker. since each column is stored separately, any required data can directly be accessed instead of reading all the data. that means any column can be used as an index, making it auto-indexing. that is why the column-oriented representation is much faster than the row-oriented representation. apart from this, data is stored in the compressed form. each column is compressed using a different scheme. in the column-oriented database, the compression is always efficient as all the values belong to the same data type. hence, column-oriented databases require less disk space, as they do not need additional storage for indexes since the data is stored within the indexes themselves. consider an example where a database table named “book” consisting of columns “bookid,” “title,” and “price.” following a column-oriented approach, all the values for bookid are stored together under the “bookid” column, all the values for title are stored together under “title” column. and so on as shown in figure 3. information technology and libraries | september 2018 37 figure 3 an example of an entity and its row and column representation. apache parquet parquet is a top-level apache project that stores data in column-oriented fashion, highly compressed and densely packed in the disk.28 it is a self-describing data format that embeds schema within the data itself. it supports efficient compression and encoding schemes that allows lowering data-storage costs and maximizes the effectiveness of querying data. parquet has added advantages, such as limiting the i/o operation and storing data in compressed form using the snappy method developed by google and used in its production environment. hence it is designed especially for space and query efficiency. snappy aims at compressing petabytes of data in minimal amounts of time, and especially aims for resolving big data issues.29 the data compression rate is more than 250 mb/sec, and decompression rate is more than 500 mb/sec. these compression and decompression rates are for a single core of a system having a core i7 processor in 64-bit mode. it is even faster than the fastest mode of zlib compression algorithm.30 parquet is implemented using column-striping and assembly-language algorithms that are optimized for storing large data-blocks.31 it supports nested data structures in which each value of the same column is stored in contiguous memory locations.32 apache parquet is flexible and can work with many programming languages because it is implemented using apache thrift (https://thrift.apache.org/). a parquet file is divided into row groups and metadata at the end of the file. each row group is divided into column values (or column chunks), such as column 1, column 2, and so on as shown in figure 4. each column value is divided into pages, and each page consists of the page header, repetition levels, definition levels, and values. the footer of the file contains various metadata, such as file metadata, column metadata, and page-header metadata. the metadata information is required to locate and find the values, just like indexing. https://thrift.apache.org/ efficiently processing and storing library linked data | sharma, marjit, and biswas 38 https://doi.org/10.6017/ital.v37i3.10177 figure 4 parquet file structure. the proposed approach the proposed approach relies on spark’s core apis—rdd, spark sql, and dataframe—which can operate on large datasets. rdd is used to load the initial data from the input file, process the data and transform them into triple structure. spark dataframe is used to load the data from rdd into the triple structure and send the transformed rdf data into a parquet file. spark sql is used to fetch the data stored in the parquet file. processing rdf data processing rdf data from large rdf/xml files requires breaking the file into smaller file components. general data-processing systems cannot handle large files because they face memory issues. at this stage, the proposed approach can process the data using an n-triples file, hence individual rdf/xml files again need to be converted into the n-triples file format. the process of breaking rdf/xml file into smaller file components and then converting them into n-triples format depends upon the size of the input file. if it is not more than 500 mb then it is directly converted into n-triples file format. multiple rdf/xml files are converted into individual ntriples file formats, which are again combined into one n-triples file, as the proposed spark application reads input from a single file. information technology and libraries | september 2018 39 schema to store rdf data a simple rdf schema with three triple entities has been designed. this schema is an rdf triple view, which is the building block of the rdf storage schema proposed in this work. the rdf triple view is a simple java class consisting of three attributes—subject, predicate, and object. given an rdf dataset d, consisting of a set of rdf triples t, in either rdf/xml or n-triples format, the dataset is transformed into a format that can be processed by a spark application. further, the dataset is transformed into a line-based format where the individual triple statement is placed in a line separated by a new-line (\n) character. a line contains three components—subject, predicate, and object separated by a space. here each line is unique, using the combined information of subject, predicate, and object. given an rdf triple structure ti, ti = (si, pi, oi) and ti ∈ t, for each t an instance of rdf triple view is created to hold the triple information. the columnar schema organizes triple information into three components, storing each component separately as subject, predicate, and object columns (figure 5). figure 5. rdf triple view. rdf storage we store the rdf data based on rdf triple view, which is the main schema for storing data in the triple representation. we do not need any indexing or additional information related to subject, predicate, or object to be stored on the disk. since we can have any number of temporary dataframe tables in memory, join operations can be performed using these tables to filter the data. in the absence of expensive indexing and additional triple information, storage area can be reduced significantly. apart from this, the compression technique used in apache parquet reduces lot more space than storing in other triple stores. in figure 6, we illustrate the data-storing process. efficiently processing and storing library linked data | sharma, marjit, and biswas 40 https://doi.org/10.6017/ital.v37i3.10177 figure 6. data-storing process in hdfs. the collection of triple instances is loaded into an rdd. at the end, the collection of triple instances is loaded into spark dataframe. spark dataframes are equivalent to the rdbms tables and support both structured and unstructured data formats. using a single schema, multiple dataframes can be used and can be registered as temporary tables in the memory, where highlevel sql queries can be executed on top of them. here the concept of using multiple dataframes with a single schema is motivated to avoid joins and indexing. in the final step, the spark dataframe is saved into hdfs files in the parquet format. from the parquet file, the data can be loaded back into dataframes in memory and queried using spark sql. fetching data from storage given an rdf dataset d, a sparql query q, and a columnar-schema s, we use s to translate q to q' to perform queries on top of s. here, the answer of query q' on top of s is equal to the answer of q on top of d. query mappings m are used to transform sparql queries into spark sql queries. for querying, first the data is loaded into a spark dataframe from parquet files. to query data using sparql, queries must follow basic graph patterns (bgp). a bgp is a set of triple patterns similar to an rdf triple (s, p, o) where any of s, p, and o can be query variables or literals. bgp is used for matching a triple pattern to an rdf graph. this process is called binding between query variables and rdf terms. the statements listed under the where clause is known as bgp consisting of query patterns. for example, the query “select ?name ?mbox where {?x foaf:name ?name . ?x foaf:mbox ?mbox .}” has two query patterns. to evaluate the query containing two query patterns, one join is required. based on the total number of query patterns, information technology and libraries | september 2018 41 we need one less number of joins. that is, for n number of query patterns we need n-1 joins to resolve the values. figure 7 illustrates the process of query execution. figure 7. process of query execution. evaluation to evaluate the proposed approach we compare the storage size with file-based storage systems such as n-triples files and rdf/xml files. we also compare with standard triple stores such as jena tdb and sesame. the data-storing time is compared with jena tdb, sesame, and parquet, having one, two, and three worker nodes respectively. finally, for the purposes of the experiment, some sparql queries are selected and tested over rdf data stored in parquet format into hdfs. the query performance is tested on the distributed system having one, two, and three worker nodes respectively. in the following subsections, we show the results for each of the above comparisons. datasets for evaluation, we use two datasets. dataset 1 contains bibliographic data from the national library of portugal (nlp) (http://opendata.bnportugal.pt/eng_linked_data.htm). from nlp, we choose the nlp catalogue datasets in rdf/xml formats. the datasets are freely available to reuse and contain metadata information from nlp catalogue, the national bibliographic database, the portuguese national bibliography, and the national digital library. the datasets are available as linked data, which were produced in the context of the european library. the size of the rdf/xml file is 6.46 gb with more than 45 billion rdf triples. http://opendata.bnportugal.pt/eng_linked_data.htm efficiently processing and storing library linked data | sharma, marjit, and biswas 42 https://doi.org/10.6017/ital.v37i3.10177 dataset 2 contains bibliographic data from the british national library (https://www.bl.uk/bibliographic/download.html). from the british national bibliography collection we choose the bnb lod books dataset. the datasets are publicly available and contain bibliographic records of different categories, such as books, locations, bibliographic resources, persons, organizations, and agents. the datasets are divided into sixty-seven files in rdf format. however, we combine them into one file in n-triples format to fit the requirement of the large size of the input data. the combined file is 22.52 gb and contains more than 16 billion rdf resources in n-triples format, making it suitable for the proposed approach. from this conversion, we get more than 150 billion rdf triples. figure 8. data storage time for different file formats. figure 9. disk size for different file formats. disk storage figure 8 shows the data-storing time using sesame, jena tdb, and parquet for the above two datasets. data from raw rdf files are stored in jena tdb and sesame. individual files are processed for storing into jena tdb and sesame to avoid memory overflow as jena or sesame models cannot load data at once from the large files. to store data in parquet format we run the program separately on different worker nodes. figure 9 presents the total disk size required for each of these file formats and triple stores for the two datasets. https://www.bl.uk/bibliographic/download.html information technology and libraries | september 2018 43 query performance for testing, the sparql queries are converted manually at this stage. we run some of the selected queries over bibliographic rdf data stored in parquet file format in hdfs. we run the following type of queries on worker nodes 1, 2 and 3 respectively. the queries are listed below: q1) the first query is to fetch the count of rdf triples present in the storage. query: select (count(*) as ?count) where ?s ?p ?o . q2) the second query is to fetch the entire dataset in spo format. it fetches data in the n triples format. query: select * { ?s ?p ?o } . q3) the third query is to fetch resources that belong to books with the subject “english language composition and exercises.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject . q4) the fourth query is to fetch resources that belong to books with the subject “english language composition and exercises” and creator “palmer frederick.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject . ?x dc:creator . q5) the fifth query is to fetch objects having predicate dcterms:ispartof. query: select ?name where ?s dcterms:ispartof ?name . figure 10 shows the query response time for the above queries on different worker nodes for two different datasets. the queries are executed in the distributed environment. it shows that increasing the number of worker nodes decreases the query response time. efficiently processing and storing library linked data | sharma, marjit, and biswas 44 https://doi.org/10.6017/ital.v37i3.10177 figure 10. query response time with different numbers of worker nodes. query comparison for comparing query response time, the proposed approach is tested with the first dataset as mentioned above. though at this stage the proposed approach requires further research to be compared with other distributed triple storage systems. also, it requires more worker nodes and larger datasets compatible for parallel processing in the distributed environment. with a smaller setup, it will be hard to analyze the performance of the individual approaches, as they may produce similar results. we compare the proposed approach with the standard jena tdb solution in a single-node environment. the following sparql queries are tested against dataset 1. prefix rdf: prefix dc: prefix rdau: prefix foaf: q1. select (count(*) as ?count) { ?s ?p ?o } q2. select * { ?s ?p ?o } q3. select ?x where { ?x rdf:type dc:bibliographicresource. } q4. select ?x where { ?x rdf:type . ?x rdau:p60339 'time out lisboa'. } q5. select ?s where {?s dc:ispartof . ?s foaf:page 'http://www.theeuropeanlibrary.org/tel4/record/3000115318515'. } information technology and libraries | september 2018 45 figure 11. query comparison. we are interested in measuring the query response time with the above queries. first, we test with jena tdb. we then test the proposed approach on a single-node environment. we execute the above set of queries multiple times to record the average performance. as mentioned above, no indexing is used in the storage. rdf triples are stored as they appeared in the n-triples file. queries are executed without indexing and are still getting better performance than jena tdb, as shown in figure 11. discussion in this article, we claim that apache spark and column-oriented databases can resolve library big data issues. especially when dealing with rdf data, spark can perform far better than other approaches because of its in-memory processing ability. concerning rdf data storage, the column-oriented database is suitable for storing the large volume of data because of its scalability, fast data loading, and highly efficient data compression and partitioning. a column-oriented database system requires less disk, reducing the storage area. as a proof, we have shown the data storage comparison and the performance of the columnar-storage for rdf data using parquet formats in hdfs. as shown in the results, apache parquet takes much less disk space as compared to other storage systems. also, the data-storing time is relatively very small as compared to others. we observed that the result of query 2 is the entire dataset stored in parquet format. the size of this resultant dataset is 22.52 gb, which is the same as the original size. the same dataset when stored with parquet format is reduced to 2.89 gb. this shows that parquet is a very optimized efficiently processing and storing library linked data | sharma, marjit, and biswas 46 https://doi.org/10.6017/ital.v37i3.10177 storage system that can reduce the storage cost. we have shown the query response time for five different sparql queries on distributed nodes for two different datasets. we believe with better schema for storing rdf triples the proposed approach can be improved, and with the used technologies a fast and reliable triple store can be designed. conclusion and future work librarians all over the globe should give priority to integrating library data with the web to enable cross-domain sharing of library data. to do this, they must pay attention to current trends in big data technologies. because the data-generation rate is increasing in every domain, traditional data processing and storage systems are becoming ineffective because of the scale and complexity of the data. in this article, we present a distributed solution for processing and storing a large volume of library linked data. from the experiment, we observe that the processing of large volume of the data takes significantly less time using the proposed approach. also, the storage area is reduced significantly as compared to other storage systems. in the future we plan to optimize the current approach using advanced technologies such as graphx, machine learning tools, and other big -data technologies for even faster data processing, searching, and analyzing. references 1 eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” library of congress, november 11, 2012, https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf. 2 brighid m. gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology and libraries 33 no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631; myung-ja k. han et al., “exposing library holdings metadata in rdf using schema.org semantics,” in international conference on dublin core and metadata applications dc-2015, são paulo, brazil, september 1–4, 2015, pp. 41–49, http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363. 3 franck michel et al., “translation of relational and non-relational databases into rdf with xr2rml,” in proceedings of the 11th international conference on web information systems and technologies, lisbon, portugal, 2015, pp. 443–54, https://doi.org/10.5220/0005448304430454; varish mulwad, tim finin, and anupam joshi, “automatically generating government linked data from tables,” working notes of aaai fall symposium on open government knowledge: ai opportunities and challenges 4, no. 3 (2011), https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf; matthew rowe, “data.dcs: converting legacy data into linked data,” ldow 628 (2010), http://ceur-ws.org/vol628/ldow2010_paper01.pdf. 4 virginia schilling, “transforming library metadata into linked library data,” association for library collections and technical services, september 25, 2012, http://www.ala.org/alcts/resources/org/cat/research/linked-data. 5 getaneh alemu et al., “linked data for libraries: benefits of a conceptual shift from libraryspecific record structures to rdf-based data models,” new library world 113, no. 11/12 (2012): 549–70 (2012), https://doi.org/10.1108/03074801211282920. https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf https://doi.org/10.6017/ital.v33i4.5631 http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363 https://doi.org/10.5220/0005448304430454 https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://www.ala.org/alcts/resources/org/cat/research/linked-data https://doi.org/10.1108/03074801211282920 information technology and libraries | september 2018 47 6 lisa goddard and gillian byrne, “the strongest link: libraries and linked data,” d-lib magazine, 16, no. 11/12 (2010), https://doi.org/10.1045/november2010-byrne. 7 t. nasser and r. s. tariq, “big data challenges,” journal of computer engineering & information technology 4, no. 3 (2015), https://doi.org/10.4172/2324-9307.1000133. 8 alexandru adrian tole, “big data challenges,” database systems journal 4, no. 3 (2013): 31–40, http://dbjournal.ro/archive/13/13_4.pdf. 9 carol jean godby and karen smith-yoshimura, “from records to things: managing the transition from legacy library metadata to linked data,” bulletin of the association for information science and technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 10 corine deliot, “publishing the british national bibliography as linked open data,” catalogue & index, issue 174 (2014): 13–18, http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf; gustavo candela et al., “migration of a library catalogue into rda linked open data,” semantic web 9, no. 4 (2017): 481–91, https://doi.org/10.3233/sw-170274; martin malmsten, “exposing library data as linked data,” ifla satellite preconference sponsored by the information technology section: emerging trends in 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf ; keri thompson and joel richard, “moving our data to the semantic web: leveraging a content management system to create the linked open library,” journal of library metadata 13, no. 2– 3 (2013): 290–309, https://doi.org/10.1080/19386389.2013.828551; jason a. clark and scott w. h. young, “linked data is people: building a knowledge graph to reshape the library staff directory,” code4lib journal 36 (2017), http://journal.code4lib.org/articles/12320; martin malmsten, “making a library catalogue part of the semantic web,” humbolt university of berlin, 2008, https://doi.org/10.18452/1260. 11 r. hastings, “linked data in libraries: status and future direction,” computers in libraries 35, no. 9 (2015): 12–28, http://www.infotoday.com/cilmag/nov15/hastings--linked-data-inlibraries.shtml. 12 mirjam keßler, “linked open data of the german national library,” in eco4r workshop lod of dnb, 2010; antoine isaac, robina clayphan, and bernhard haslhofer, “europeana: moving to linked open data,” information standards quarterly 24, no. 2/3 (2012)<>; carol jean godby and ray denenberg, “common ground: exploring compatibilities between the linked data models of the library of congress and oclc,” oclc online computer library center, 2015, https://files.eric.ed.gov/fulltext/ed564824.pdf. 13 chunning wang et al., “exposing library data with big data technology: a review,” 2016 ieee/acis 15th international conference on computer and information science (icis), pp. 1-6, https://doi.org/10.1109/icis.2016.7550937. 14 b. mcbride, “jena: a semantic web toolkit,” ieee internet computing 6, no. 6 (2002): 55–59, https://doi.org/10.1109/mic.2002.1067737; jeen broekstra, arjohn kampman, and frank van https://doi.org/10.1045/november2010-byrne https://doi.org/10.4172/2324-9307.1000133 http://dbjournal.ro/archive/13/13_4.pdf https://doi.org/10.1002/bul2.2017.1720430209 http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf https://doi.org/10.3233/sw-170274 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf https://doi.org/10.1080/19386389.2013.828551 http://journal.code4lib.org/articles/12320 https://doi.org/10.18452/1260 http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml https://files.eric.ed.gov/fulltext/ed564824.pdf https://doi.org/10.1109/icis.2016.7550937 https://doi.org/10.1109/mic.2002.1067737 efficiently processing and storing library linked data | sharma, marjit, and biswas 48 https://doi.org/10.6017/ital.v37i3.10177 harmelen, “sesame: a generic architecture for storing and querying rdf and rdf schema,” international semantic web conference, ed. j. davies, d. fensel, and f. van harmelen (berlin and heidelberg: springer, 2002), https://doi.org/10.1002/0470858060.ch5. 15 “apache jena—tdb,” apache jena, accessed august 22, 2018, https://jena.apache.org/documentation/tdb/. 16 “sesame (framework),” everipedia, july 15, 2016, https://everipedia.org/wiki/sesame_(framework)/. 17 asim ullah et al., “bookont: a comprehensive book structural ontology for book search and retrieval,” 2016 international conference on frontiers of information technology (fit), 211– 16, https://doi.org/10.1109/fit.2016.046. 18 tom heath and christian bizer, “linked data: evolving the web into a global data space,” synthesis lectures on the semantic web: theory and technology 1, no. 1 (2011): 1–136, https://doi.org/10.2200/s00334ed1v01y201102wbe001. 19 christian bizer et al., “linked data on the web (ldow2008),” proceeding of the 17th international conference on world wide web—www 08, 2008, pp. 1265–66 (2008), https://doi.org/10.1145/1367497.1367760. 20 eric prud and andy seaborne, “sparql query language for rdf,” w3c recommendation, january 15, 2008, https://www.w3.org/tr/rdf-sparql-query/. 21 devin gaffney, “how to use sparql,” datagov wiki rss, last modified april 7, 2010, https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql. 22 tom white, hadoop: the definitive guide (sebastopol, ca: o’reilly media,, 2012), https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide. 3rd.edition.jan.2012.pdf. 23 dhruba borthakur, “the hadoop distributed file system: architecture and design,” hadoop project website, 2007, http://svn.apache.org/repos/asf/hadoop/common/tags/release0.16.3/docs/hdfs_design.pdf; seema maitrey and c. k. jha, “mapreduce: simplified data analysis of big data,” procedia computer science 57 (2015), 563–71 (2015), https://doi.org/10.1016/j.procs.2015.07.392. 24 michael armbrust et al., “spark sql: relational data processing in spark,” in proceedings of the 2015 acm sigmod international conference on management of data (new york: acm, 2015), 1383–94, https://doi.org/10.1145/2723372.2742797. 25 abdul ghaffar shoro and tariq rahim soomro, “big data analysis: apache spark perspective,” global journal of computer science and technology 15, no. 1 (2015), https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf. 26 salman salloum et al., “big data analytics on apache spark,” international journal of data science and analytics 1, no. 3–4 (2016): 145–64, https://doi.org/10.1007/s41060-016-0027-9. https://doi.org/10.1002/0470858060.ch5 https://jena.apache.org/documentation/tdb/ https://everipedia.org/wiki/sesame_(framework)/ https://doi.org/10.1109/fit.2016.046 https://doi.org/10.2200/s00334ed1v01y201102wbe001 https://doi.org/10.1145/1367497.1367760 https://www.w3.org/tr/rdf-sparql-query/ https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf https://doi.org/10.1016/j.procs.2015.07.392 https://doi.org/10.1145/2723372.2742797 https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf https://doi.org/10.1007/s41060-016-0027-9 information technology and libraries | september 2018 49 27 daniel j. abadi, samuel r. madden, and nabil hachem, “column-stores vs. row-stores: how different are they really?,” in proceedings of the 2008 acm sigmod international conference on management of data (new york: acm, 2008), 967–80, https://doi.org/10.1145/1376616.1376712. 28 deepak vohra, “apache parquet,” in practical hadoop ecosystem (berkeley, ca: apress, 2016), 325–35, https://doi.org/10.1007/978-1-4842-2199-0_8. 29 “google/snappy,” github, january 04, 2018, https://github.com/google/snappy. 30 jean-loup gailly and mark adler, “zlib compression library,” 2004, https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4. 31 sergey melnik et al., “dremel: interactive analysis of web-scale datasets,” proceedings of the vldb endowment 3, no. 1–2 (2010): 330–39, https://doi.org/10.14778/1920841.1920886. 32 marcel kornacker et al., “impala: a modern, open-source sql engine for hadoop,” in proceedings of the 7th biennial conference on innovative data systems research, asilomar, california, january 4–7, 2015, http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf. https://doi.org/10.1145/1376616.1376712 https://doi.org/10.1007/978-1-4842-2199-0_8 https://github.com/google/snappy https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4 https://doi.org/10.14778/1920841.1920886 http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf abstract introduction literature review background structure of rdf triples hadoop and mapreduce spark and resilient distributed datasets (rdd) spark sql and dataframe column-oriented database apache parquet the proposed approach processing rdf data schema to store rdf data rdf storage fetching data from storage evaluation datasets disk storage query performance query comparison discussion conclusion and future work references japanese military “comfort women” knowledge graph: linking fragmented digital records article japanese military “comfort women” knowledge graph linking fragmented digital records haram park and haklae kim information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15799 haram park (haram9553@gmail.com) is master student, library and information science, chung-ang university, haklae kim (haklaekim@cau.ac.kr) is associate professor, library and information science, chung-ang university. © 2023. abstract materials related to japanese military “comfort women” in korea are managed by several institutions. each digital archive has their own metadata schema and management policies. so far, a standard or a common guideline for describing digital records is not formalized. we propose a japanese military “comfort women” knowledge graph to semantically interlink the digital records from distributed digital archives. to build a japanese military “comfort women” knowledge graph, digital records and descriptive metadata were collected from existing digital archives. a list of metadata was defined by analyzing commonly used properties and a knowledge model designed by reusing standard vocabularies. knowledge was constructed by interlinking the collected records, external data sources, and enriching data. the knowledge graph was evaluated using the fair data maturity model. introduction in december 1991, kim hak-sun (a korean) became the first woman to disclose and identify as a former “comfort woman.”1 in february 1992, ms. itoh hideko discovered three telegrams in the japanese defense agency stating that not only korean but also taiwanese women had been dispatched as “comfort women.”2 between 1931 and 1945, the imperial japanese army forced approximately 200,000 girls and young women from korea, china, and other countries, known as “comfort women,” into sexual slavery. these women came from all over east asia, but the majority, over 80 percent, were from south korea.3 it was not until the early 1990s that survivors began to share their stories and demand justice. many international organizations and volunteers continue to participate in advocacy and campaigns to solve the japanese military sexual slavery.4 however, the japanese government has never accepted legal responsibility or agreed to pay reparations.5 regardless of political interpretation, we believe it is critical to reveal the historical truth. the records of japanese military “comfort women” serve as objective evidence to prove the fact that the japanese military indulged in sexual slavery. as there are now only 13 elderly survivors left in south korea, the records could serve as one of the key pieces of evidence for understanding the japanese military “comfort women.” in korea, materials related to japanese military “comfort women” are managed by the national archives of korea and some private organizations, and some of this material is being provided as digital archives.6 digital archives systematically describe digital resources so that users can effectively search and view the materials.7 in general, digital archives describe digital resources based on guidelines for expressing standard metadata elements and data values that are mainly used in the domain. for mailto:haram9553@gmail.com mailto:haklaekim@cau.ac.kr information technology and libraries march 2023 japanese military “comfort women” knowledge graph 2 park and kim example, the us library of congress is creating digital resources with varying levels and types of descriptive metadata, providing an increasingly coordinated and standardized approach to the creation and management of descriptive metadata.8 however, for the digital archives related to japanese military “comfort women,” there are no recommendations or agreed guidelines on metadata for describing digital records. even when metadata standards such as dublin core are used, there remain variations in describing metadata elements of digital records. therefore, linking or integrating the digital records with different metadata structures and values is difficult. to solve this problem, a metadata model to describe digital records related to japanese military “comfort women” should be developed, and digital records should be systematically described. if the various pieces of information contained in the digital record are expressed in a format that a machine can understand, a precise search is possible based on the meaning and relationship of the data. a knowledge graph can be applied to define the relationships between the various entities included in japanese military “comfort women” records. in particular, the records existing in a distributed digital archive can be expressed as objects that can be identified on the web, so that different records can be linked at a semantic level.9 this study proposes a method to interlink and search digital records of the digital archives of japanese military “comfort women.” for describing and linking distributed digital records, a set of metadata elements was proposed, and a knowledge model was defined by examining the common metadata model and the existing rdf vocabulary. the collected digital records were constructed as a knowledge graph, using a knowledge model. the knowledge graph was evaluated by applying the fair data maturity model.10 the remainder of this paper is organized as follows. the literature review introduces the japanese military “comfort women” issue and describes the concepts and research trends related to knowledge graphs. we then introduce the case of korean digital archives containing materials about the japanese military’s use of “comfort women.” next, we describe the process of developing a knowledge graph in detail and define sparql queries, comparing the search results of existing digital archives and knowledge graphs, and describing differences in fair data maturity. finally, the research results are summarized, and future research directions are described. literature review japanese military “comfort women” the japanese military “comfort women” issue was made official in 1991 when the korean council for the women drafted for military sexual slavery by japan and the korean victims appealed to solve the problem themselves,11 through activities such as the testimony of victims,12 and the activities of individual researchers and civic groups,13 raising issues through the international community and through domestic and international judicial procedures.14 through these efforts, the japanese military “comfort women” issue has been seen as a problem of forced mobilization, human trafficking, sexual exploitation, and extreme human rights violations by the ruling state targeting women in the colonized state.15 however, the japanese military “comfort women” were a cause of conflict and confrontation between victims and their families, private organizations, and the south korean and japanese governments. for example, mark ramseyer defined the japanese military “comfort women” in his paper as prostitutes (ianfu) who, based on game theory, engaged in prostitution to the japanese military for high wages during the pacific war.16 this sparked a information technology and libraries march 2023 japanese military “comfort women” knowledge graph 3 park and kim debate about historical distortion.17 some argue that the “comfort women” issue is not viewed as a conflict between korea and japan but as a women’s and a universal human rights issue.18 from a political and social point of view, research on the japanese military “comfort women” is active, but insufficient research has been conducted on archives and records management due to licensing of records, data sharing, and a lack of qualified personnel. various licensing policies and sharing limitations apply to the records kept by different institutions. as a result, the preservation and exchange of documents are nominal, and they are administered with a minimal amount of personnel. records are essential evidence for discussing historical truths. fifteen organizations, from eight countries, have tried to list the records of the japanese military “comfort women” as unesco world’s documentary heritage.19 a total of 2,744 records have been requested, including materials that prove the japanese military’s “comfort women” system or materials produced by “comfort women” victims. however, the decision to list japanese military “comfort women” records as unesco documentary heritage has been postponed due to tensions between south korea and japan.20 the national archives of korea has selected materials related to the “comfort women” of the japanese military as a nation-designated record and is integrating and managing these records.21 however, most records are scattered in various university research institutes, nongovernmental organizations, and institutions, and it is difficult to systematically preserve and manage them. reuse of ontology vocabularies and fair data principles the records of the japanese military “comfort women” are not systematically managed, and existing digital archives tend not to contain sufficient information contained in the original records. a previous study suggested a metadata schema for the integrated management of the records of japanese military “comfort women.”22 however, although most studies suggest common metadata elements, they do not include methods for representing and processing records in a machine-readable format.23 reusing vocabularies is recommended to foster interoperability and facilitate knowledge use by interlinking new datasets to existing resources. some previous efforts demonstrate a way of interlinking digital resources on the web by using several ontology vocabularies.24 in particular, freire et al. propose a mapping from schema.org metadata to the europeana data model. the proposed method is suitable for metadata aggregation in the area of cultural heritage by enriching the semantics of the schema.org model.25 the fair data principles are designed to reinforce the reusability of research data and are defined as four principles: findable, accessible, interoperable, and reusable.26 in particular, the fair principles emphasize the ability of machines to find and use data on their own, in accordance with the research data management environment.27 initially, the fair principles were recognized as a tool to enhance the reusability of research data in the context of open science; however, they are now being extended to a universal framework for preserving and managing data in the long term.28 representative examples include fair metrics,29 the data maturity model of the rda (research data alliance) working group,30 and fairsfair.31 fair metrics presents an evaluation framework that can measure fair indices using an automated tool. discussions on the fair principle are also expanding in digital archives and libraries.32 koster and woutersen-windhouwer propose the fair principle suitable for lam (libraries, archives, museums) collections and suggest a practical method to increase the reusability of digital cultural heritage.33 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 4 park and kim digital archives of japanese military “comfort women” the records or documents of the japanese military “comfort women” are managed in the form of digital archives by national and private institutions. table 1 summarizes the status of digital archives held by each institution as representative digital archives. the wednesday demonstration archive is a digital archive operated by the korean council. it contains a record of the “regular demand demonstration to solve the japanese military’s sexual slavery problem” that began in january 1992. the archive contains 1,085 records, and each record is described with 17 metadata elements. archive 814, named for the annual day of remembrance of the japanese military “comfort women” observed on august 14, aims to develop efforts and research results table 1. status of records by archives archives organization number of digital records number of descriptive metadata url wednesday demonstration the korean council 1,085 17 https://womenandwarmuseum.net archive 814 research institute on japanese military sexual slavery 596 20 https://www.archive814.or.kr/ digital collection of “comfort women” seoul metropolitan archives 137 25 https://archives.seoul.go.kr/class/cc0003 gender archive seoul foundation of women and family 408 88 http://genderarchive.or.kr/ nationdesignated archives no. 8 national archives of korea 27 20 https://theme.archives.go.kr//next/n ationalarchives/subpage/nationalarc hives7.do note: archive names in the following sections are abbreviated for readability: wed: wednesday demonstration; a814: archive 814; sma: digital collection of “comfort women”; gen: gender archive; nak: nation-designated archives no. 8. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 5 park and kim surrounding the “comfort women” issue. archive 814 has 596 records, including domestic and foreign legal records, official documents, collections by subject, chronological tables, and book lists. the seoul archives provides documents proving the existence of japanese military “comfort women” and comfort stations from documents produced by the allied forces during world war ii. in total, 137 records were provided, and each record consisted of 25 descriptive metadata elements. the gender archive provides documents on the issue of “military sexual slavery by japan” and “the women’s international war crimes tribunal on japan’s military sexual slavery.” a total of 408 records were provided, with 88 metadata elements describing each record. the national archives of korea has designated records related to japanese military “comfort women” as nation-designated archives no. 8. among the records (approximately 3,060 cases) owned by house of sharing (http://www.nanum.org/eng/main/index.php) and daegu citizen forum for halmuni (http://www.1945815.or.kr/), 27 records are selected as major records, and digitized records including 20 metadata elements are provided. development of japanese military “comfort women” knowledge graph data preprocessing a total of 2,253 records and metadata were collected from the five digital archives. excluding records with insufficient information (a814 and nak had three and two documents, respectively), 2,248 records were constructed as a knowledge graph. metadata values in the collected records are not consistently expressed. for example, the seoul archives indicates the institution in the form “[organization/group] jinseong jeong research team, seoul national university, 2015,” whereas “kunji takei, governor of yamagata prefecture” in archive 814 has a combination of person, organization, and his position together. these values are separated into relevant categories and described in the corresponding metadata elements (e.g., “kunji takei, governor of yamagata prefecture” is divided to “kunji takei” (name) and “yamagata prefecture” (his position)). the units for expressing metadata values such as “production date” and “language” are also unified, and errors in some data values are corrected directly (e.g., “gabrelle kirk mcdonald” is changed to “gabrielle kirk mcdonald”, restoring the “i” to her first name). in addition, a new classification system is defined by aligning and integrating existing categories, since digital archives uses different categories (e.g., book/publication, document). a model of designing a knowledge graph two tasks are performed to transform the collected data into a knowledge graph. since the metadata elements used in digital archives are different, metadata properties commonly used in archives are extracted. for common metadata, the scope of reuse is determined by investigating the existing rdf vocabularies and adding to the proposed knowledge model. common metadata elements among the selected archives are defined by the following two criteria: 1. metadata elements commonly used in all archives were extracted. metadata elements present in all five archives, such as title, description, identifier, license, and url are mandatory. metadata elements defined in two or more archives, such as “production date” and “language,” are optional properties. even if the metadata name written in korean is different, it is regarded as the same metadata element if its purpose is to indicate the same data value. http://www.nanum.org/eng/main/index.php http://www.1945815.or.kr/ information technology and libraries march 2023 japanese military “comfort women” knowledge graph 6 park and kim 2. metadata elements not used in the actual data were excluded from the model. for example, ged has 88 metadata elements. however, there were no data values for 60 of these elements. table 2 summarizes a list of metadata elements for describing the records of digital archives. a proposed model should be able to represent the context of individual records and their own properties. after investigating semantic relationships between common metadata elements and existing vocabularies, the proposed model is defined. the model reuses existing vocabularies, such as dcmi (dublin core metadata initiative) metadata terms for describing online resources, skos (simple knowledge organization system) for representing taxonomies, ric-o (records in contexts – ontology) for describing digital records, and schema.org for supporting universal search on the web. the basic structure of the japanese military “comfort women” knowledge model is illustrated in figure 1. all records that are digital resources (“#record”) are instances of schema:archivecomponent and represent records provided by each archive. the individual records contain information on several people and organizations. for example, the schema:creator property describes a creator who creates a record, the schema:contributor can be used to represent a person who contributes a record, and the schema:mentions is to represent a thing related to a record. an archive manager who holds or maintains a record can be described using the schema:holdingarchive property, and the archive manager is represented by the schema:archiveorganization class. if the value of each property is a type of organization, then the value of rdfs:range is the schema:organization class. figure 1. abstract structure of the japanese military “comfort women” knowledge graph. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 7 park and kim table 2. mapping results of both metadata elements and models of the knowledge graph wed a814 sma nak gen property entity value mandatory title title title title dc:title schema:title schema: archivecomponent xsd:string yes identifier registration number identification number management number dc:identifier schema:identifier schema: archivecomponent xsd:string yes description scope and content description dc:description schema:description schema: archivecomponent xsd:string yes production date production date year of production itm:date schema:datecreated schema: archivecomponent xsd:datetime no creator creator production institution itm:creator schema:creator schema: archivecomponent schema:person; schema:organizati on yes license license rights statement license cc:license schema: archivecomponent cc:license yes management organization management organization service provider management organization schema:holdingarchive schema: archivecomponent schema: archiveorganizatio n yes url url url url schema:sameas schema: archivecomponent schema:url yes attachment view attachment view attachment view attachment view file schema:mainentityofpage schema: archivecomponent schema:url no attachment download download schema:downloadurl schema: archivecomponent schema:url no information technology and libraries march 2023 japanese military “comfort women” knowledge graph 8 park and kim wed a814 sma nak gen property entity value mandatory record type record type record type record type itm:typeofrecord rico:hascontentoftype schema: archivecomponent skos:concept yes format type of document itm:formatofrecord rico:hasdocumentaryformt ype schema: archivecomponent skos:concept no number of pages number of pages itm:size/amount schema:numberofpages schema: archivecomponent xsd:nonnegativein teger no language itm:langauage schema:inlanguage schema: archivecomponent schema:language no periodic classification temporal coverage schema:temporalcoverage schema: archivecomponent xsd:string no related terms related information itm:relatedperson; itm:relatedorganizati on; itm:relatedevent schema:mentions schema: archivecomponent schema:person; schema:organizati on; schema:event no donor/collect or contributor, collector/provid er itm:donor schema:contributor schema: archivecomponent schema:person; schema:organizati on no information technology and libraries march 2023 japanese military “comfort women” knowledge graph 9 park and kim data enrichment and transformation data enrichment refers to the process of appending or otherwise enhancing the collected data with the relevant context obtained from additional sources. in the collected digital records, the entities of person and organization are linked to wikidata (http://wikidata.org) and the enriched information is expanded to a knowledge graph using the rdf extension of openrefine (http://openrefine.org). a total of 654 terms were extracted from the existing archives for people and organizations. after removing duplicates, the dictionary contained 150 people and 312 organizations. for each term in the dictionary, a matching entity is searched for in wikidata. if the entity name matches completely, the uri of wikidata is assigned automatically. thirty-eight percent of people (57) and 28 percent of organizations (88) matched between the dictionary and wikidata. matched entities can be added to the knowledge graph by extracting the properties and values of wikidata. for example, kim bok-dong is linked to wikidata (q16175111), and citizenship, occupation, place of birth, and gender, which did not exist in the collected data, are added to the knowledge graph. as a result, six properties are representing the extended properties were mapped (e.g., citizenship is mapped to fetched from the person and three attributes are obtained from the organization. a total of nine properties were expanded by data enrichment, and vocabularies for schema:nationality). the constructed knowledge graph had 47,499 triples for 3,069 entities. the collected records and information contained in the records included 2,560 objects. the number of entities expanded through wikidata was 145 (88 individuals and 57 entities) and were added to the organization. the enriched entity contained 2,144 explicit statements and 102 inferred statements. as shown in table 3, the total number of triples was 47,327 for explicit statements and 172 for inferred statements. the knowledge graph is published on github (https://github.com/hike-lab/comfortwomen-archives). table 3. statistics of the constructed knowledge graph entities explicit statements implicit statements sum of statements collected entities 2,560 45,213 70 45,283 enriched entities 509 2,114 102 2,216 sum 3,069 47,327 172 47,499 figure 2 shows the information about “jan ruff o’herne” in the knowledge graph. she is a dutchaustralian sexually enslaved by the japanese military and has been active as a human rights activist since she disclosed in 1992 that she had been sexually enslaved by the japanese army. the knowledge graph links several records produced or contributed by o’herne. wed’s record (wednes-demo-368) links jan ruff o’herne with related information (schema:mentions), and a814’s record (a814-107) links jan ruff o’herne as the record’s creator (schema:creator). existing digital archives do not provide specific information about the person, organization, or http://wikidata.org/ http://openrefine.org/ https://github.com/hike-lab/comfort-women-archives https://github.com/hike-lab/comfort-women-archives information technology and libraries march 2023 japanese military “comfort women” knowledge graph 10 park and kim event described in metadata. if anyone does not know that jan ruff o’herne was a victim of the japanese military “comfort women,” it is difficult to fully understand the record of “letter from jan ruff-o’herne in support of us congress resolution 121 in 2007” provided by a814. however, as shown figure 2 the knowledge graph provides a rich context for understanding her and her associated records. figure 2. semantic relationships of jan ruff o’herne on the knowledge graph. evaluation the evaluation of the constructed knowledge graph was carried out in two ways: 1) discoverability among five archives and the knowledge graph is compared by using several semantic queries, and 2) the fair data evaluation was applied to the knowledge graph and existing digital archives. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 11 park and kim discoverability all queries aim to find out all digital records across five digital archives by using search conditions and are designed by the rdf standard query language (sparql). table 4 is an example query (q3), and the records produced from 1990 to 1994 in digital resources are sorted in ascending order. at this time, the values of all the objects must exactly match the rdf:type, and regardless of the physical location, the object is identified based on the uri and included in the search result. table 4. a sparql query example (q3) prefix schema: prefix rdfs: prefix xsd: select ?title ?date ?archiveorganizationname where { ?record rdf:type schema:archivecomponent; schema:name ?title; schema:datecreated ?date; schema:holdingarchive ?archiveorganization . ?archiveorganization rdfs:label ?archiveorganizationname filter (?date >= ‘1990-01-01’^^xsd:date && ?date <= ‘1994-12 31’^^xsd:date) } order by ?date table 5. list of sparql queries queries description number of results q1 select all records of japanese military “comfort women” 2,248 q2 select all records whose record type is ‘document’ 1,793 q3 select records produced between 1990 and 1994, and sort in ascending order 345 q4 select all information about ‘ministry of gender equality and family’ 480 q5 select all information about ‘jan ruff-o’herne’ 120 table 5 summarizes the queries constructed to search for a knowledge graph, and figure 4 shows the results of the comparison between the search of the existing archives and the query of the knowledge graph. the existing archives provide keyword-based search without considering the http://schema.org/ http://www.w3.org/2000/01/rdf-schema http://www.w3.org/2001/xmlschema information technology and libraries march 2023 japanese military “comfort women” knowledge graph 12 park and kim meaning and relationship of search keywords. furthermore, they do not share any common categories or classifications among others. a knowledge graph that semantically links records in different digital archives also enables accurate and relevant discovery. q1, q2, and q3 find all digital records matching the query condition and information semantically linked to those records. for example, gen had 169 records produced between 1990 and 1994. since the archive did not support the search for a type of a record, it is not possible to specifically search for the record type in q2 and q3. however, in the knowledge graph, the record type is rico:hascontenttypeof; thus, information is expressed at the semantic level, such that 169 related records can be retrieved. q4 and q5 discover entities based on their semantic relations. “ministry of gender equality and family” in q4 is an organization, and each government uses the name of the department slightly differently (e.g., “ministry of gender equality”). q5 discovers different entities in existing archives. the knowledge graph semantically defines the variant of entities and their types. as a result, the knowledge graph provided 104 more search results in q4 and nine more search results in q5 than the existing archives. figure 3. search results of knowledge graph and existing digital archives. fair data evaluation for the knowledge graph the fair data evaluation for the constructed knowledge graph reveals a clear improvement compared to existing archives. findable, accessible, and interoperable follow the fair data principles. all objects of the constructed knowledge graph can be identified by uri, and metadata elements are described with a standard vocabulary, so that the machine can search for digital resources. digital resources in the existing archives are accessible over the web, therefore accessible received a pretty good score. however, access to the metadata of ind ividual records was restricted, as the majority of metadata elements were described as simple strings instead of machine-readable forms. all the information in the knowledge graph has improved accessibility by providing uris to metadata elements. in addition, to avoid being linked to the resources of the existing archive, standardized vocabulary, such as schema.org and dublin core, was applied to increase the connectivity between data, and rich contextual information was provided through semantic linkage with wikidata. as shown in figure 5, the evaluation score of reusable is 0.7, which is 2.9 times better than the existing archives. the metadata elements in the information technology and libraries march 2023 japanese military “comfort women” knowledge graph 13 park and kim knowledge graph clearly describe a license for reuse. in particular, the creative commons license and the korea open government license provide machine-readable uri information to enhance reusability. however, data for which licensing information is not clear or not provided are left blank. in summary, the constructed knowledge graph semantically connects digital resources fragmented in different archives, enables a rich search, and satisfies all fair data indicators. figure 4. results of fair data evaluation of the knowledge graph and existing digital archives. conclusion this study proposed a method for linking and searching digital records from the japanese military “comfort women” digital archive. in korea, materials related to japanese military “comfort women” are managed by several institutions, some of which are provided as digital archive services. however, the existing digital archives describe digital records without common standards or guidelines, and the metadata of individual records are expressed in text format in html documents without explicitly expressing their structure and meaning. therefore, digital records that exist in different digital archives cannot be connected even if they have the same context, such as subject, event, person, or institution. this study proposed a common metadata model for the descriptive metadata of digital records and constructed a knowledge graph in which digital records are semantically interlinked. furthermore, the fair data maturity model was used to evaluate the constructed knowledge graph. the constructed knowledge graph semantically defines the relationship between the various entities included in the records of japanese military “comfort women.” in particular, records existing in a distributed digital archive are expressed as objects that can be identified on the web, so that different records can be explored at a semantic level. the knowledge model proposed herein is the first attempt to describe digital records related information technology and libraries march 2023 japanese military “comfort women” knowledge graph 14 park and kim to japanese military “comfort women”; thus, it can serve as a starting point for discussing a comprehensive model for describing fragmented digital records worldwide. we also apply an open license to disclose all the collected records and construct knowledge graphs for further collaboration. however, there are also considerations for the construction and management of high-quality digital records. first, the records must contain accurate and rich semantic information. the collected digital archives have an average of 16 metadata elements, but because the metadata elements and values differ among institutions, the data accuracy needs to be improved. second, it is necessary to clearly provide conditions for the use of records. most records do not provide a clear license for terms of use. it is important to explicitly express and provide international or korean standard licenses for digital resources. finally, it is necessary to discuss the records of japanese military “comfort women” using open data. the sharing of records and the promotion of information exchange between domestic and international scholars can both be facilitated by the opening of records, which can also play a significant role in the long-term preservation and sharing of records. as a majority of records are fragmented and difficult to discover and manage, it is necessary to find an effective method to preserve the records by opening and sharing them and to lead research cooperation at home and abroad. endnotes 1 chunghee sarah soh, “the korean ‘comfort women’: movement for redress,” asian survey 36, no. 12 (1996): 1226–40, https://doi.org/10.2307/2645577. 2 shogo suzuki, “the competition to attain justice for past wrongs: the ‘comfort women’ issue in taiwan,” pacific affairs 84, no. 2 (june 2011): 223–44, https://doi.org/10.5509/2011842223. 3 center for korean legal studies, “military sexual slavery, 1931–1945,” accessed october 17, 2022, https://kls.law.columbia.edu/content/military-sexual-slavery-1931-1945. 4 kathryn j. witt, “comfort women: the 1946–1948 tokyo war crimes trials and historical blindness,” the great lakes journal of undergraduate history 4, no. 1 (september 2016): 17– 34. 5 “south korea: lawsuits against japanese government last chance for justice for ‘comfort women’,” amnesty international, accessed october 17, 2022, https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-thejapanese-government-last-chance-for-justice-for-comfort-women/. 6 sincheol lee and hye-in han, “comfort women: a focus on recent findings from korea and china,” asian journal of women’s studies 21, no. 1 (march 2015): 40–64, https://doi.org/10.1080/12259276.2015.1029229. 7 itza a. carbajal and michelle caswell, “critical digital archives: a review from archival studies,” the american historical review 126, no. 3 (september 2021): 1102–20, https://doi.org/10.1093/ahr/rhab359. https://doi.org/10.2307/2645577 https://doi.org/10.5509/2011842223 https://kls.law.columbia.edu/content/military-sexual-slavery-1931-1945 https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-the-japanese-government-last-chance-for-justice-for-comfort-women/ https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-the-japanese-government-last-chance-for-justice-for-comfort-women/ https://doi.org/10.1080/12259276.2015.1029229 https://doi.org/10.1093/ahr/rhab359 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 15 park and kim 8 “library of congress metadata for digital content – master data element list version 4.1,” library of congress, accessed october 4, 2022, https://www.loc.gov/standards/mdc/elements/masterdataelementlist-20120215.doc. 9 stefano ferilli and domenico redavid, “an ontology and knowledge graph infrastructure for digital library knowledge representation,” italian research conference on digital libraries, (january 2020): 47–61, https://doi.org/10.1007/978-3-030-39905-4_6. 10 mark d. wilkinson et al., “evaluating fair maturity through a scalable, automated, communitygoverned framework,” scientific data 6, no. 174 (september 2019): 1–12, https://doi.org/10.1038/s41597-019-0184-5. 11 na-young lee, “the korean women’s movement of japanese military ‘comfort women’: navigating between nationalism and feminism,” the review of korean studies 17, no. 1 (june 2014): 71–92. 12 jaeyeon lee, “the ethno-nationalist solidarity and (dis)comfort in the wednesday demonstration in south korea,” gender, place & culture (2021): 1–14, https://doi.org/10.1080/0966369x.2021.2016655. 13 lee and han, “comfort women,” 40–64. 14 witt, “comfort women,” 17–34. 15 na-young lee, “the korean women’s movement,” 71–92. 16 j. mark ramseyer, “contracting for sex in the pacific war,” international review of law and economics 65, (march 2021): 105971, https://doi.org/10.1016/j.irle.2020.105971. 17 andrew gordon and carter eckert, “statement by andrew gordon and carter eckert concerning j. mark ramseyer, ‘contracting for sex in the pacific war’,” accessed october 4, 2022, https://nrs.harvard.edu/urn-3:hul.instrepos:37366904. 18 jaeyeon lee, “the ethno-nationalist solidarity and (dis)comfort,” 1–14. 19 heisoo shin, “voices of the ‘comfort women’: the power politics surrounding the unesco documentary heritage,” the asia–pacific journal 19, no. 5 (march 2021): 1–19. 20 ian e. wilson, “the unesco memory of the world program: promise postponed,” archivaria 87, (may 2019): 106–37. 21 yunshin hong, “epilogue: ‘comfort stations’ as sites of remembrance,” in “comfort stations” as remembered by okinawans during world war ii, ed. robert ricketts (leiden: brill, 2020), 432– 59. 22 ji hyeon bong and young joon nam, “a study on the design of metadata elements for management of oral history archives about sexual slavery by japan’s military,” journal of https://www.loc.gov/standards/mdc/elements/masterdataelementlist-20120215.doc https://doi.org/10.1007/978-3-030-39905-4_6 https://doi.org/10.1038/s41597-019-0184-5 https://doi.org/10.1080/0966369x.2021.2016655 https://doi.org/10.1016/j.irle.2020.105971 https://nrs.harvard.edu/urn-3:hul.instrepos:37366904 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 16 park and kim korean society of archives and records management 19, no. 1 (february 2019): 225–50, https://doi.org/10.14404/jksarm.2019.19.1.225. 23 haram park and haklae kim, “a knowledge graph on japanese ‘comfort women’: interlinking fragmented digital archival resources,” journal of korean society of archives and records management 21, no. 3 (august 2021): 61–78, https://doi.org/10.14404/jksarm.2021.21.3.061. 24 myung-ja k. han et al., “exposing library holdings metadata in rdf using schema.org semantics,” international conference on dublin core and metadata applications, (september 2015): 41–49, https://dcpapers.dublincore.org/pubs/article/view/3772. 25 nuno freire, valentine charles, and antoine isaac, “evaluation of schema.org for aggregation of cultural heritage metadata,” semantic web (june 2018): 225–39, https://doi.org/10.1007/978-3-319-93417-4_15. 26 mark d. wilkinson et al., “the fair guiding principles for scientific data management and stewardship,” scientific data 3, no. 160018 (march 2016): 1–9, https://doi.org/10.1038/sdata.2016.18. 27 “fairification process,” go fair, accessed october 4, 2022, https://www.go-fair.org/fairprinciples/fairification-process/. 28 christian haux and petra knaup, “using fair metadata for secondary use of administrative claims data,” studies in health technology and informatics 264 (august 2019): 1472–73, https://doi.org/https://doi.org/10.3233/shti190490. 29 wilkinson et al., “evaluating fair maturity,” 1–12. 30 christophe bahim et al., “the fair data maturity model: an approach to harmonise fair assessments,” data science journal 19, no. 1 (october 2020): 41, https://doi.org/10.5334/dsj2020-041. 31 ansuriya devaraju et al., “fairsfair data object assessment metrics (v0.4),” fairsfair, (october 2020): https://doi.org/10.5281/zenodo.4081213. 32 silvia calamai and francesca frontini, “fair data principles and their application to speech and oral archives,” journal of new music research 47, no. 4 (may 2018): 339–54, https://doi.org/10.1080/09298215.2018.1473449; gustavo candela et al., “reusing digital collections from glam institutions,” journal of information science 48, no. 2 (august 2020): 251–67, https://doi.org/10.1177/0165551520950246; danuta nitecki and adi alter, “leading fair adoption across the institution: a collaboration between an academic library and a technology provider,” data science journal 20, no. 1 (february 2021): 6, https://doi.org/10.5334/dsj-2021-006. 33 lukas koster and saskia woutersen-windhouwer, “fair principles for library, archive and museum collections: a proposal for standards for reusable collections,” code4lib journal 40 (may 2018). https://doi.org/10.14404/jksarm.2019.19.1.225 https://doi.org/10.14404/jksarm.2021.21.3.061 https://dcpapers.dublincore.org/pubs/article/view/3772 https://doi.org/10.1007/978-3-319-93417-4_15 https://doi.org/10.1038/sdata.2016.18 https://www.go-fair.org/fair-principles/fairification-process/ https://www.go-fair.org/fair-principles/fairification-process/ https://doi.org/https:/doi.org/10.3233/shti190490 https://doi.org/10.5334/dsj-2020-041 https://doi.org/10.5334/dsj-2020-041 https://doi.org/10.5281/zenodo.4081213 https://doi.org/10.1080/09298215.2018.1473449 https://doi.org/10.1177/0165551520950246 https://doi.org/10.5334/dsj-2021-006 abstract introduction literature review japanese military “comfort women” reuse of ontology vocabularies and fair data principles digital archives of japanese military “comfort women” development of japanese military “comfort women” knowledge graph data preprocessing a model of designing a knowledge graph data enrichment and transformation evaluation discoverability fair data evaluation for the knowledge graph conclusion endnotes 54 information technology and libraries | june 2011 recreation, law enforcement and public safety, and social services available in the community ■■ access to electronic encyclopedias, local libraries’ catalogs, full-text articles online, and document delivery.”2 at the time we were asking the question, will an information infrastructure be built? the answer? most assuredly. indeed, librarians stepped up to the table and ensured that the public had access to information-related services at their local library. the information the public asked for in 1994, as listed above, is widely available today. there are numerous examples in which librarians and libraries have served as leaders in the ongoing sustainablity of local, regional, and national information networks. it was pointed out at the time, and remains true today, that in an era of ever-shrinking resources, libraries cannot and should not compete with telecommunications, entertainment, and computer companies. they need to “join them as equals in the information arena.”3 lita has a viable role in the development of the twentyfirst-century skills that will firmly put the information infrastructure into place. a lita member is appointed as a liaison to the office for information technology policy (oitp) and serves on the lita technology and access committee, which addresses similar issues. the lita transliteracy interest group explores, develops, and promotes the role of libraries in all aspects of literacy. working with the oitp provides lita membership with the opportunity to participate in current issues, such as digital literacy. the information infrastructure has come a long way in the last twenty some years. there is still much to be done. robert bocher, technology consultant with the wisconsin state library and oitp fellow, will present “building the future: addressing library broadband connectivity issues in the 21st century” at the lita president’s program from 4 p.m. to 5:30 p.m. on sunday, june 26, at the ala annual conference in new orleans. i look forward to seeing you at the program and to hear about the successes and the work that remains to be done to address the broadband needs we all face in the country. references 1. federal communications commission, the national broadband plan: chapter 2: goals for a high performance america, http://www.broadband.gov/plan/2 -goals-for-a-high-performance-america/ (accessed apr. 2, 2011). 2. karen starr, “the american public, the public library, and the internet; an ever-evolving partnership” in the cybrarian’s manual, ed. pat ensor (chicago: ala, 1997): 23–24. 3. ibid., 31. t wenty years ago, librarians became involved in the implementation of the internet for the use of the public across the country. those initiatives were soon followed by the bill and melinda gates foundation projects supporting public libraries, which included funding hardware grants to implement public computer labs and connectivity grants to support high-speed internet connections. in 2008, the institute of museum and library services (imls) convened a task force to define twentyfirst-century skills for museums and libraries, which became an ongoing national initiative (http://www.imls .gov/about/21stcskills.shtm). the one year anniversary of the release of the national broadband plan was march 16, 2011. as described on broadband.gov, the plan is intended “to create a high-performance america—a more productive, creative, efficient america in which affordable broadband is available everywhere and everyone has the means and skills to use valuable broadband applications.”1 in 1994, the idaho state library’s development division cosponsored eight focus groups in which 179 people participated. the participants were asked several questions, including the types of information they would like to see on the internet. the results reflected the public’s interest at that time in the following: ■■ “expert advice on a variety of topics including medicine, law, car repair, computer technology, animal husbandry, and gardening ■■ economic development, investment, bank rates, consumer product safety, and insurance ■■ community-based information such as events, volunteers, local classified advertisements, special interest groups, housing information, public meetings, transportation schedules, and local employment opportunities ■■ computer training, foreign language programs, homework service, teacher recertification, school activities, school scheduling, and adult education ■■ electronic mail and the ability to transfer files locally as well as worldwide ■■ access to public records, voting records of legislators, absentee voting, the ability to renew a driver’s license, the rules and regulations from governmental agencies, and taxes ■■ information about hunting and fishing, environmental quality, the local weather, road advisories, sports, karen j. starr (karen.j.starr@gmail.com) is lita president 2010-11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: 21st century skills, 21st century infrastructure 90 information technology and libraries | september 2011 michael witt i ’ll never forget helping one of my relatives learn how to use his first computer. we ran through the basics: turning the computer and monitor on, pointing and clicking, typing, and opening and closing windows. i went away to college, and when i came back for the holidays, he happily showed off his new abilities to send emails and create spreadsheets and such. despite his well-earned pride, i couldn’t help but notice that when he reached the edge of the desk with the mouse, he would use his other hand to place a photo album up against the desk and roll the mouse onto it, in order to reach the far right-hand side of the screen with the pointer. when i picked up his hand and the mouse and re-centered it on the desk for him, i think it blew his mind. he had been using the photo album to extend the reach of the mouse and pointer for months! it occurred to me that i should have spent more time with him, not just showing him what to do, but watching him do it. those of us working in information technology have a tremendous impact on library staff productivity by virtue of the systems we select or develop and implement. people working in most facets of library operations trust and rely on our hardware and software to accomplish their daily work, for which we bear a significant burden of responsibility. are they using the best possible tools for their work? are they using them in the best way? a great deal of effort has gone into user-centered design and improving functionality for our patrons, but in this time of reduced budgets and changing staff roles, it is important to extend similar consideration to the systems that we provision for our co-workers. at its best, information technology has the ability to save time and add value to the library by creating efficiencies and empowering people to do better and new work. whether we are evaluating new integrated library systems or choosing the default text editor for our workstations, we are presented with opportunities to learn more about how our libraries accomplish work “on the ground” and reconsider the role that technology can play in helping them. the phrase “eating your own dog food” is so common in software development circles that some have begun using it as a verb. developers engage in “dogfooding” by using new software themselves, internally, to identify bugs and improve usability and functionality before releasing it to users. this is a regular practice of companies such as microsoft1 and google2. setting aside any negative connotations for the moment (why are people eating dog food? and exactly who are the “dogs” in this scenario?), there is a lot that we can learn by putting ourselves in the place of our users and experiencing our systems from their perspective. perhaps the best way to do this is to walk around the building and spend time in each unit of the library, shadowing its staff and observing how they interact with systems to do their work. try to learn their workflow and observe the tasks they perform—both online and offline. you don’t need to become an expert, but ideally you’d be able to try to perform some of the tasks yourself. in one case, we were able to identify and enable someone to design and run their own reports, which helped their unit make more timely decisions and eliminated the need for it to run monthly reports on their behalf. if these tasks support user-facing interactions, you might get some good usability information in the process too. for example, i learned more about our library’s website by working chat reference for an hour a week than i did in two years of web development team meetings! part of this process is attempting to feel our users’ pain, too. do you use the same locked-down workstation image that you deploy to your staff desktops? there is also a tendency among it staff to keep the newest and best machines for their own use and cycle older machines to other units. i understand—it staff are working with databases and doing developing software, and so we benefit the most from higher-performing machines—but keep in mind that your co-workers likely have older, slower machines and take the lowest common denominator hardware into account when selecting new software. by walking a mile in your users’ shows, you may gain a deeper appreciation and understanding of the other units of the library and how they work together. because so much work is done on computers, people working in information technology can often see a broad picture of the activities of the library. we have the ability to make connections and identify potential points of integration, not only between machines but also between people and their work. references 1. pascal g. zachary, showstopper! the breakneck race to create windows nt and the next generation at microsoft (new york: free press, 1994): 129–56. 2. stephen levy, “inside google+: how the search giant plans to go social,” http://www.wired.com/epicenter/2011/06/ inside-google-plus-social/all/1 (accessed july 12, 2011). editorial board thoughts: eating our own dogfood michael witt (mwitt@purdue.edu) is the interdisciplinary research librarian and an assistant professor of library science at purdue university in west lafayette, indiana. he serves on the editorial board of ital. rarely analyzed: the relationship between digital and physical rare books collections article rarely analyzed the relationship between digital and physical rare books collections allison mccormack and rachel wittmann information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13415 allison mccormack (allie.mccormack@utah.edu) is the original cataloger for special collections, university of utah, university of utah. rachel wittmann (rachel.wittmann@utah.edu) is the digital curation librarian, university of utah. © 2022. abstract the relationship between physical and digitized rare books can be complex and, at times, nebulous. when building a digital library, should showcasing a representative slice of the physical collection be the goal? should stakeholders focus on preservation, high-use items, or other concerns? to explore these conundrums, a special collections librarian and a digital services librarian performed a comparative analysis of their library’s physical and digital rare books collections. after exporting marc metadata for the rare books from their ils, the librarians examined the place of publication, publication date, and broad subject range of the collection. they used this data to create a variety of visualizations with the open-source digital humanities tool tableau public. next, the authors downloaded the rare books metadata from the digital library and created illuminating data visualizations. were the geographic, temporal, and subject scopes of the digital library similar to those of the physical rare books collection? if not, what accounted for the differences? the implications of these and other findings will be explored. introduction as of august 2019, the special collections division of the university of utah j. willard marriott library held over 256,000 printed works and archival collections. approximately 22% of the collection, or just over 55,000 works, belongs to the rare books department (https://lib.utah.edu/collections/rarebooks/), which contains not only books but serials, maps, manuscripts, ephemera, and other formats. the collection covers over 4,000 years of human history, with its earliest piece, a cuneiform tablet, dating to the mid-twenty-third century bce; contains works from nearly 100 different countries; and represents a wide variety of topics, including the exploration and settlement of the american west and the history of the book. the rare books department, a subset of special collections, specifically seeks to document the history of written human communication and actively collects historical items to enhance teaching and research at the university of utah. the marriott library has been adding digitized works from the rare books department to its digital library (https://collections.lib.utah.edu/) for over 25 years. approximately 780 works, or 1.42% of the rare books collection, has been digitized to date. however, no formal collection development plan was ever written, and rare books were selected for digitization by both curators and patrons. unfortunately, the reason a particular item was digitized is not recorded in the system: it is unclear if age, research value, physical condition, a desire to bring forward underrepresented stories, or a combination of these and other factors influenced the decision to digitize a rare book. this piecemeal approach to digital library collection development, while not uncommon, made it difficult for library staff and patrons to determine the relationship between the digital and physical collections of rare books. it also presented challenges when library staff mailto:allie.mccormack@utah.edu mailto:rachel.wittmann@utah.edu https://lib.utah.edu/collections/rarebooks/ https://collections.lib.utah.edu/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 2 attempted to communicate the scope and intent of the digital library to patrons, who assumed that the digitized items were representative of the overall collection. given their expertise in library metadata, the authors decided to analyze both traditional library catalog records and digital library records for the rare books collection and explore whether the digital collection was proportionally representative of the physical collection or if it differed in geographic, temporal, or subject scope in a meaningful way. they then created a series of data visualizations to better communicate information about the library’s rare books holdings. literature review while much has been written about methods and criteria for selecting special collections items to be digitized and the effects of digitization on collection accessibility, few authors have discussed the relationships between digital collections and the physical collections from which they were sourced. in their highly detailed treatise on selection strategies for digitization, ooghe and moreels identify representativity, a method that “aims for a final selection that provides a representative view of the original collections,” as one of 25 selection criteria for digitization projects.1 however, alexandra mills notes that “without a thorough understanding of the institution and collections, it is impossible to create truly representative collections.”2 because many digitization initiatives are undertaken in response to user requests, preservation concerns, or the availability of projectbased funding, it is likely that most libraries do not plan for their digital collections to be representative of their overall special collections holdings. as peter michel states, the digital collections at the university of nevada, las vegas, were explicitly built with popular history and popular culture in mind and were never intended to be “surrogates of the collection.”3 bradley daigle of the university of virginia explained that digitization could be undertaken to alleviate preservation concerns, respond to defined research needs, or to brand certain online content, but this approach could give the mistaken impression “that only the important materials are digitized.”4 despite the gaps in the literature, having an explicit collection development policy is still considered paramount; indeed, it is the very first principle listed in the national information standards organization (niso)’s framework for building “good” digital collections.5 to investigate this type of documentation further, a google search was employed using the search term “digital collection development policy site:edu”. this yielded 10 publicly accessible digital collection development policies from academic libraries in the united states: 6 • amherst college library (https://www.amherst.edu/library/services/digital/digitalcolldev) • emerson college archives and special collections (https://www.emerson.edu/policies/digital-collections-development-policy) • colorado state university libraries (https://lib.colostate.edu/digital-collectiondevelopment-policy/) • florida atlantic university digital library (https://library.fau.edu/policy/digital-librarycollection-development-policy) • georgetown university library (https://www.library.georgetown.edu/digital-projectpolicy) • northern illinois university digital library (https://digital.lib.niu.edu/policy/collectiondevelopment-policy) https://www.amherst.edu/library/services/digital/digitalcolldev https://www.emerson.edu/policies/digital-collections-development-policy https://lib.colostate.edu/digital-collection-development-policy/ https://lib.colostate.edu/digital-collection-development-policy/ https://library.fau.edu/policy/digital-library-collection-development-policy https://library.fau.edu/policy/digital-library-collection-development-policy https://www.library.georgetown.edu/digital-project-policy https://www.library.georgetown.edu/digital-project-policy https://digital.lib.niu.edu/policy/collection-development-policy https://digital.lib.niu.edu/policy/collection-development-policy information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 3 • oregon health and sciences university digital collections (https://www.ohsu.edu/library/ohsu-digital-collections-development-policy) • university of north texas university libraries (https://library.unt.edu/policies/collection-development-digital-collections/) • wesleyan university digital library (https://digitalcollections.wesleyan.edu/about/whatwe-collect) • williams college special collections (https://specialcollections.williams.edu/collectiondevelopment-policies/digital-collections/) in reviewing the sample of 10 universities’ digital collection development policies, homogenous content becomes apparent. almost all of the policies include a mission statement, scope, and selection criteria for potential digital collection items. all policies include criteria that physical materials should meet in order to qualify for digitization. the most common criteria for digitization are materials that are rare or unique, high-use, fragile, important to institutional or regional history, and/or support campus curriculum or faculty research. in addition, the clearance to publish materials online is ubiquitous among the policies. materials eligible for online display must either be in the public domain or intellectual property rights are held by the institution, and materials currently under copyright must receive permission from the copyright holder. a measured approach to digitization qualification has been employed by the university of north texas (unt) libraries’ digital collections and the northern illinois university digital library (niudl). unt libraries’ digital collections policy lists levels of criteria that materials must meet in order to be digitized and included in the digital library; to qualify for digitization, all criteria on level one must be met while only one criterion from level two is needed. niudl includes a priority factor rubric which includes criteria categories and corresponding numerical scale with a maximum point of 35, the higher value signifying an elevated priority. six of the 10 policies include prioritizing materials that support diversity and inclusion missions on campus. amherst college has leveraged their digital collection development policy to include content that would increase perspectives of underrepresented groups within the digital collections and traditionally underrepresented groups more broadly. niudl includes marginalized groups as a collection priority area in order to “deepen public understanding of the histories of people of color and other communities and populations whose work, experiences, and perspectives have been insufficiently recognized or unattended” and lists over 20 such groups. the collection candidate’s relationship to other collections is outlined in four of the 10 policies. georgetown university requires that “the materials form a coherent collection, fill gaps in existing collections, or complement existing collection strengths.” amherst college evaluates whether digitization would “enhance public awareness of archives’ collection strengths.” another function of a digital collection development policy is to inform the public on the scope and provenance of contents in their digital library. the unt digital collection policy includes a section outlining the content contributors, including partners, which can be beneficial for large-scale digital libraries that host collections from multiple partners. unt is also exemplary in defining collection curators and their responsibilities while underscoring the nature of this role, likely changing over time and not set to an individual. with no written digital collection development policy regarding special collections at the marriott library, the authors would first have to analyze both the physical and digital special collections before determining what factors may have influenced the digitization of these materials. libraries are gathering massive amounts of data, ranging from the metadata of their varied collections to patron usage statistics of both physical and digital collections. interpretation of the https://www.ohsu.edu/library/ohsu-digital-collections-development-policy https://library.unt.edu/policies/collection-development-digital-collections/ https://digitalcollections.wesleyan.edu/about/what-we-collect https://digitalcollections.wesleyan.edu/about/what-we-collect https://specialcollections.williams.edu/collection-development-policies/digital-collections/ https://specialcollections.williams.edu/collection-development-policies/digital-collections/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 4 ever-growing accumulation of data can quickly become complex. by visualizing data, we are able to interpret large and often messy sets of data while processing multiple aspects of the data concurrently. for example, the ohio state university (osu) libraries used tableau desktop to combine data from various departments in order to better manage and explore information.7 tableau was osu’s data visualization software of choice due to its ease of use and accessibility, and the program was also used to create dashboards that blend data from various sources for realtime visualizations. bibliographic metadata cleanup to understand the marriott library’s collections, one must first understand the relevant metadata, which for the rare books department is in the machine-readable cataloging (marc) format. a popular criticism of marc, commonly used in traditional library cataloging, is that the schema is highly regulated and, at times, redundant. however, for the purposes of this project, those qualities proved to be a boon. an older, uncorrected record in the digital library might list london as the place of publication for a particular book, but it was not immediately apparent if that referred to london, england; london, ontario; or london, ohio. however, a marc record would not only list a book’s city of publication in the 260 or 264 field but would also contain a two or three-letter code in the 008 field that specified the country, us state, canadian province or territory, or australian state or territory in which it was published. for this reason, the authors decided to base their analysis on marc record data from the physical collection instead of the dublin core metadata used in the digital library. in order to tease out the relationships between our digital and physical collections, each of the approximately 55,000 rare books bibliographic records stored in alma, the marriott library’s cloud-based library services platform, would have to have a common set of data points that could be compared. for the purposes of this analysis, the authors chose to investigate the place of publication and the subject of each work. despite the relative rigidity of marc metadata, some of the alma records lacked country of publication data in the 008 field. these records were not incorrect, but merely outdated: some had been copied directly from paper catalog card s when the library first transitioned to a computer-based cataloging system, while others were created using different metadata standards. approximately 6,000 rare books either completely lacked a country code in the 008 field or had data that could possibly be enhanced by, for example, replacing a code for the united states with a code for a particular state. instead of editing all 6,000 records by hand, the cataloger wrote several metadata normalization rules in alma to automatically correct the most obvious errors. records that listed chicago as the place of publication were assigned the marc geographic code for illinois, while those that were published in lugduni batavorum, the latin designation for leiden, were given the geographic code for the netherlands. however, 3,000 records were unable to be enhanced in this manner, either because their place of publication was an ambiguous city name like cambridge or because the place of publication was listed as unknown. the cataloger examined each record individ ually and was ultimately unable to assign a marc geographic code to 1,682 records, most of which were arabic manuscripts or advertising pamphlets that simply did not list a place of publication or creation. while these records would be excluded from the place of publication analysis, they could be mined for data on other topics. with the marc records as complete as possible, the metadata was exported from alma into an excel spreadsheet and given to the metadata librarian for further manipulation. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 5 metadata transformation & visualization creation the next phase involved standardizing the raw metadata to create human readable data, rather than marc codes, that are necessary to produce data visualizations. once the physical rare books ’ bibliographic metadata was updated in alma, it was then exported as a comma-separated values file. the raw data export produced a massive spreadsheet containing over 50,000 marc records. these records included twoand three-letter location codes for the place of publication from the library of congress marc code list for geographic areas. two-letter codes are used for most countries, while three-letter codes are used for states within the united states, provinces within canada, and territories within the united kingdom. while this additional level of location data was available for books from the united kingdom and canada, it was decided to review the collection at a country level for consistency and map display. books from the united states, however, were analyzed on a state level, considering the research is germane to an american institution. using a list correlating these codes to the location name provided by the library of congress (https://www.loc.gov/marc/countries/countries_code.html), a vlookup formula was used in microsoft excel to add the location names to the marc records. the vlookup formula pulls in data from one table to another as long as the two tables have one data field in common. in this exercise, both tables of data contained the library of congress location codes, therefore the lc location codes were used to add the location names to the table containing the marc metadata. once the location names were added, there were some additional quality control steps required, as lc location names that included outdated country names posed issues to mapping the data to current country names and boundaries. for example, we combined the codes for east germany and west berlin for the one representing contemporary germany. for countries that have since been dissolved and rezoned to multiple countries, e.g., the ussr and czechoslovakia, these records were manually checked for city names and then added to the current country. once this process was completed, the results showed the rare books were published in 97 countries and all 50 united states, as well as the district of columbia. examining the subject content of the rare books physical collection was another aspect of analysis for this project. in contemplating this analysis, using the lc subject heading field was considered, however, faceting of lc subject headings and the structure of the exported data posed too many issues for a rather simple analysis. instead, the library of congress call number was used to extract high-level lc classification information for each work by separating the first two letters of the call numbers included in the exported marc metadata, which indicated lc class and subclass. to add the lc class and subclass names to these letters, a vlookup formula was used again to match the letter codes to the list of lc classification categories. once classification categor ies were added to the 55,000 records, works from all 21 lc master classes and 190 subclasses were represented in the rare books collection. in addition to the physical rare books collection held at the marriott library, there is a selection of this collection that has been digitized and is accessible in the marriott digital library. the rare books digital collection (https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc) comprises 780 works, although this number includes unique records for individual volumes within a series and therefore is not a true comparison to marc metadata records, which contain one record for a series. for example, the silver reef miner, a newspaper “devoted to the mining interests of southern utah” published during the late nineteenth century, has 30 individual volumes in the digital library, but these are represented in just one marc record. in order to compare the digital collection to the physical collection, the datasets would need to have https://www.loc.gov/marc/countries/countries_code.html https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 6 consistent data for comparison, namely place of publication and lc classification derived from call numbers. the digital collection metadata is in the dublin core schema, which does not include all of the metadata found in the marc metadata, nor does it use the same format. while there is a dublin core spatial element used to capture geographic data on what the item is about, this does not always align neatly with the location of an item’s publication. for example, reise in das innere nord-america in den jahren 1832 bis 1834 (2 volumes) is a book printed in germany that documents an expedition to north america in 1832–1834 and includes illustrations of native american people from the swiss artist karl bodmer. for these volumes, the appropriate dublin core spatial data would include the specific regions the expedition traveled to in north america; in the marc 26x field, however, it contains koblenz, germany, the city where the volumes were published. call number data was included for many digitized works, but not in a consistent format. in order to use the same data to compare the physical rare books collection to the digital one, the digital collection metadata was updated with the improved/accurate call numbers found in the marc metadata. another improvement to the digital collection metadata was the addition of the metadata management system (mms) id unique numerical identifiers that aid in locating a record in the alma system. when the rare books’ descriptive metadata was originally converted to dublin core during the digitization process, some titles and call numbers were changed and became different from their physical counterparts. the inclusion of the mms id allows for a consistent identifier between the physical and digital collections. when selecting data visualization software, being able to create a map of the places where books in the rare collection were published was a priority. considering the goal of creating an easily replicable workflow for other libraries, the authors sought a freely accessible program that did not require advanced geospatial skills, unlike esri’s arcgis software. tableau software is a data visualization software package with both a public and desktop version. the tableau desktop version requires a subscription fee while tableau public is open access. for the purposes of this study, tableau public offered open access and mapping features that are enabled without any geospatial knowledge necessary. analysis creating a variety of data visualizations allowed information about the rare books physical and digital collections to be more apparent than merely browsing entries in a spreadsheet. for example, there are numerous geographic disparities between the two collections of rare materials as shown in the american states in which works from the collections were published. while books from all 50 states are found in the physical collection (fig. 1), only 18 states are represented in the digital library (fig. 2), with new york being the state in which the highest number of books were published. as new york city has long been a major publishing center in the united states, the authors were not surprised by this. however, the subsequent states were quite different: california and utah ranked second and third for the physical collection, while massachusetts and pennsylvania claimed those spots for the digital library. the authors believe several factors might influence this discrepancy. first, works can only be added to the digital library if they are no longer in copyright, and states with longer histories of european-american settlement are more likely to have published books that are now out of copyright. furthermore, these older books are more likely to be in a fragile condition and therefore may have been digitized to decrease the amount of physical handling to which they are subjected. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 7 figure 1. marriott library physical rare books by us state. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 8 figure 2. marriott library digital rare books by us state. there are other discrepancies when comparing the country of publication between the physical (fig. 3) and digital collections (fig. 4). while 61% of the physical rare books were published in the united states, only 20% of the digitized works were published in this country. the authors expected to see egypt rank highly in the physical collection, as many of the rare books were purchased by former university of utah professor dr. aziz atiya to support the middle east center for research he founded; similarly high in rank, britain, germany, france, and italy were all major centers for the early printing and publishing trade in early modern europe. however, there is strong geographic bias in the digital collection, as only north america, western europe, and one african country are represented online. copyright may again play a factor, as the earliest books from non-western countries in the collection often date to the twentieth century, but a eurocentric or other bias cannot immediately be discounted. while the physical collection contains many more european imprints than from the global south, it is much more diverse than the digital collection. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 9 figure 3. marriott library physical rare books by country. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 10 figure 4. marriott library digital rare books by country. the analysis of the subjects represented in the collection proved to be somewhat challenging to study. due to the nature and structure of library of congress subject headings, which attempt to mirror natural language and may be composed of “strings” of phrases to represent complex topics, no tableau public visualization could be created that effectively grouped similar content areas together without looking quite fragmented. instead, the authors based their analysis of subjects on library of congress classification numbers (i.e., call numbers) assigned to works, which, though not exact, can be understood as distillations of the subject of a work.8 once again there were considerable differences between the physical and digital rare books collections (fig. 5). as in many generalized special collections, literature and history make up significant portions of the physical collection. however, works on bibliography, or the study of books and book history, comprise a notable percentage of the collection. many of these are modern works on book history and special collections librarianship and therefore are unable to be digitized due to copyright law. nearly 9% of the digital collection is on the sciences, though these works comprise only 3% of the physical collection. while this portion of the holdings may be relatively small, it contains many scientific high points such as vesalius’ de humani corporis fabrica, early printings of ancient mathematical texts, and the journals of major scientific societies, which may have been digitized both for physical preservation as well as high interest on the part of students and faculty on campus. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 11 figure 5. percentage of rare books physical and digital collections by library of congress class. next steps now that the first phase of the project is complete, the authors would like to conduct additional analyses. first, they plan to compare the usage statistics of the digital rare books collection to the circulation statistics of the physical collection. this method of inquiry was not possible at the start of the project, as circulation information for the rare books was previously not tracked in the integrated library system. now that rare books are checked out to patrons for use in the special collections reading room, this data can be quickly pulled from alma. once there is a year’s worth of circulation data for the rare books unhindered by the changes necessitated by the coronavirus pandemic, the authors will compare the usage statistics of the digital collection for the same time period. do patrons in the reading room look at similar materials to online patrons, or are their interests vastly different? are some rare books used so frequently that they would benefit from the added physical security that digitization brings? information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 12 the authors also plan to pull annual usage statistics from the digitized rare books and share this with special collections division leadership. online patrons are still library patrons, and the division can use the viewing data to show the national and international reach of the collection. relatedly, the authors will investigate the digital library usage data in more depth. do patrons from utah, the united states, and the world look at similar materials, or are there geographic divides among the online patrons? do countries that are home to a majority of the university’s international student body have higher viewership numbers? finally, the authors wish to convene a group of stakeholders to create a formal collection development plan for the rare books component of the digital library. given the library’s limited resources, it is imperative that digitization be done thoughtfully and systematically. there is a good rationale for creating a digital collection that is representative of the physical rare books collection as well as one that highlights certain collection areas. both material fragility and the modern scholarly emphasis on highlighting the stories of people of color, women, and other underrepresented groups in library collections provide strong counterarguments to making digital libraries strictly representative of their physical counterparts. since informal conversations with patrons of the marriott library revealed that they assumed the digital library was representative of the collection overall, it is imperative that this assumption be either confirmed or disclaimed in a publicly viewable statement. in the case of the rare books department, the authors are in favor of a focused, rather than representative, collection development policy. firstly, many of the books in the collection are under copyright and therefore cannot be digitized, while other materials like reference sources for rare books librarians will be of limited interest to the general public. furthermore, complex items such as artists’ books are often poor candidates for digitization, as they may have movable components that cannot be captured accurately in a still photograph. as for what should be included online, the authors fully support equity, diversity, and inclusion efforts at the university of utah and would like to see the digital library highlight materials from marginalized groups whenever possible. usage statistics from the physical and digital collections, when they become available, should also inform the collection development policy to encourage traffic to the digital library. whatever is ultimately decided, however, the clarity a written policy provides will help streamline decision-making and ultimately help both library staff and patrons understand and search within the digital library much more effectively. endnotes 1 bart ooghe and dries moreels, “analysing selection for digitisation: current practices and common incentives,” d-lib magazine 15, no. 9/10 (2009), https://doi.org/10.1045/september2009-ooghe. 2 alexandra mills, “user impact on selection, digitization, and the development of digital special collections,” new review of academic librarianship 21, no. 2 (2015): 166. https://doi.org/10.1080/13614533.2015.1042117. 3 peter michel, “digitizing special collections: to boldly go where we’ve been before,” library hi tech 23, no. 3 (2005): 382, https://doi.org/10.1108/07378830510621793. https://doi.org/10.1045/september2009-ooghe https://doi.org/10.1080/13614533.2015.1042117 https://doi.org/10.1108/07378830510621793 information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 13 4 bradley j. daigle, “the digital transformation of special collections,” journal of library administration 52, no. 3–4 (2012): 253, https://doi.org/10.1080/01930826.2012.684504. 5 niso framework working group, a framework of guidance for building good digital collections (2007), https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf. 6 the urls in the following list were accurate as of march 2, 2022. 7 sarah anne murphy, “data visualization and rapid analytics: applying tableau desktop to support library decision-making,” journal of web librarianship 7, no. 4 (2013): 465–76, https://doi.org/10.1080/19322909.2013.825148. 8 readers who do not work with marc metadata may not be familiar with how library of congress call numbers are assigned. created in 1891, the classification system is based on 21 classes designated by a single letter; subclasses add one or two letters to the initial class. catalogers must choose which one of the classes to assign to a particular work. the subject headings may guide a cataloger towards a certain class, but there is not a 1:1 relationship between subject headings and call number classes. https://doi.org/10.1080/01930826.2012.684504 https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf https://doi.org/10.1080/19322909.2013.825148 abstract introduction literature review bibliographic metadata cleanup metadata transformation & visualization creation analysis next steps endnotes services to mobile users: the best practice from the top-visited public libraries in the us article services to mobile users the best practice from the top-visited public libraries in the us yan quan liu and sarah lewis information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15143 yan quan liu (liuy1@southernct.edu) is professor, southern connecticut state university. sarah lewis (sbojo32@gmail.com) is mlis graduate, southern connecticut state university. © 2023. abstract libraries are adapting to the changing times by providing mobile services. one hundred fifty-one libraries were chosen based on circulation, with at least one library or library system from each state, to explore the diverse services provided to mobile users across the united states. according to the data, mobile apps, mobile reference services, mobile library catalogs, and mobile printing are among public libraries’ most-frequently offered services, as determined by mobile visits, content analysis, and librarian survey responses. every library examined had at least one mobile website, mobile catalog, mobile app, or webpage adapted for a mobile device. following the covid-19 outbreak, services such as mobile renewal, subscriber database access, mobile reservations, and the ability to interact with a librarian were expanded to allow better communication with customers—all from the comfort and safety of their own homes. libraries are continually looking for innovative methods to assist their mobile customers as the world changes. introduction searching can be done on a computer, but it’s more likely to be done on a mobile device.1 according to data from the pew research center, about 96 percent of american adults own a cell phone.2 americans are connecting to the internet via their mobile devices in greater numbers than ever before. the pew report also states that 81 percent of americans have a smartphone.3 while direct usage is not measured in terms of how it relates to public libraries, the reality is that users are looking to connect with businesses and services through their mobile devices. while the covid virus’ ongoing spread has had a significant effect on various public services sectors around the world, libraries, especially in suburban areas, had to evolve and adapt to the changing environment. public libraries now offer more patron services. while many public libraries offered curbside services during covid-19 as a way to provide continuity of service, contactless services—services without having to speak with librarians directly—also became prevalent.4 these services cross geographical boundaries and reduce the risk of disease transmission from direct contact with other people. however, are there areas where libraries are lacking? are there areas where libraries can improve, especially as more and more people are relying on mobile services rather than in-person ones? this study examines current mobile services being offered in public libraries across the united states, how these offerings changed due to the pandemic, and what services libraries are looking to offer in the future. literature review several studies have explored mobile devices and their usage in a broader context. currently, it is estimated that 67 percent of the world’s population has mobile devices, with most of those devices being smartphones.5 in the united states alone, nearly 80 percent of the population owns a mailto:liuy1@southernct.edu mailto:sbojo32@gmail.com information technology and libraries march 2023 services to mobile users 2 liu and lewis smartphone.6 people across the world use mobile media to talk to one another, order food, and attend meetings and appointments. “mobile media—referring to devices, services, and content accessible on the go—has in a decade rapidly become a part of urban culture, and the habit of using a mobile device in public is only increasing.”7 once the covid-19 pandemic hit, use of mobile media was the easy way to connect as shelter-in-place orders were mandated across the country. even as many people are venturing out again, using mobile media for certain services is likely going to continue. the world is forever changed because of the pandemic, and organizations like libraries have sought new ways to connect with their patrons. status of library services provided in public libraries while a few research articles have addressed services provided to mobile users in libraries, none at the time of writing were focused explicitly on services for mobile users in public libraries in the united states. some studies focused on libraries overseas.8 others focused on what drives a library to provide services for mobile devices and what drives users to want to access their library through a mobile device.9 with time, the availability of mobile-friendly services has become critical to a library’s long-term viability. the ability to search a library’s catalog, for example, has aided a traditional library’s modern relevance. “the digital library on mobile devices has been a milestone in library industry development, leading to huge changes in knowledge carrying, spreading, acquiring, processing , and sharing of cloud computing and big data in online and offline forms.”10 users want data at their fingertips, whether it is an online order or a library catalog. “social sharing functions such as reading, borrowing, sharing, comment tracking, automatic retrieval, and new book recommendations on mobile devices such as mobile phones have become the popular trend of digital library development.”11 when evaluating mobile library websites and apps that libraries use to offer services, the literature appears to imply that a mobile version of a library website is more prevalent than a dedicated (or “native”) app. this is because of the development cost, the different mobile platforms (apple, android, etc.), and the maintenance required.12 instead of creating an app, it is often easier to optimize the library’s website for mobile devices through responsive design techniques. the challenge for accessing a library website on a mobile device is making sure the website has the same functionality as when viewed on a desktop computer. “today’s mobile users are no longer satisfied with simple mobile websites with only a small fraction of the information and features that are available on desktop websites. the small screen size of a mobile device may make performing certain tasks more tedious or cumbersome, but mobile users do expect to perform more and more tasks on their mobile devices.”13 lemire et al. performed a study of how mobile services have improved since 2010. while academic libraries were analyzed and surveyed, this information is helpful and relevant to the study of public libraries. mobile apps offered across various libraries while mobile apps and mobile webpages aren’t precisely the same (as proven by previous literature), the study public library mobile apps in scotland: views from the local authorities and the public sheds some insight on the use of mobile apps. it seems that many libraries do not have or are not interested in developing a mobile app (this coincides with the findings from lemire et al.’s study in the united states).14 however, when the public was surveyed, they did want more information technology and libraries march 2023 services to mobile users 3 liu and lewis remote services offered to mobile devices from their libraries. to increase the use of library services, apps should be considered.15 “by using an app instead of a mobile-enabled website, all the functionalities of smart technology can be incorporated to the library’s advantage. improved communication with patrons increases exposure to communities who otherwise would not use library services.”16 a similar study was done in malawi, where mobile services to libraries were analyzed. as we might expect, since malawi has a developing economy, it has fewer mobile services then the more mature economies of the us or scotland. the country recognizes the potential of using mobile devices to access library services. computer shortages are often a problem within libraries in developing countries, so by creating mobile services, users could access the library through their own mobile devices.17 studies have also found that it is important to know what users are looking for and to use that information when creating an app design. “perceived situation efficiency and perceived mobile library quality positively affect intention to use mobile library, demonstrating that both quality and situation efficiency are necessary to satisfy library users’ needs in mobile era.”18 the quality of the library app or mobile website is obviously important, but knowing what kind of services will be provided is also important. users are not going to turn to an app if it is not going to provide the information they need. this reported study proved that mobile users want to quickly obtain the information they want in the most effective way possible. library services for mobile users universities are often more likely offering mobile services than are small-town libraries. nearly all prestigious universities in the united states are already using mobile-friendly services, according to a study of one hundred university libraries.19 typically, the services for mobile users include “mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e-books.”20 public libraries may not offer all these mobile services, such as augmented reality, but the majority provide access to a mobile opac or library catalog. guo, liu, and bielefield examined how urban public libraries provide services for and offer content to mobile users.21 their analysis explored what was being done in an urban setting to potentially help public libraries plan and create mobile services. they looked into literature dating back to 1991, when mobile data was just a thought in some forward-thinkers’ minds, as was the concept of what a “mobile device” entailed. the study used current research to group library services into two categories: “traditional library services modified to be available via mobile devices and services created for mobile devices.”22 to conduct their study, a list of 138 urban libraries in the united states was used based on the urban libraries council. all 138 were examined using the same criteria. a list of contents was created (components of mobile websites, components of mobile apps, mobile reference services, social media, mobile reservation services, mobile printing, apps or databases). the findings supported the hypothesis that services for mobile users have been in place in urban libraries in the united states. according to guo et al., 95 percent offer at least one type of service for mobile users.23 pope and others discussed sms or text messaging services to a mobile device. researchers also mention the my info quest project, which was trying to get more libraries on board with using text information technology and libraries march 2023 services to mobile users 4 liu and lewis messaging, one of the mobile services studied in this review.24 literature from the fields of information science and library science shows that in recent years, typical library services have been adapted to be accessible via a mobile device rather than a service being developed specifically for a mobile platform. research design a selection of two to five of the most-visited public libraries per state was chosen based on a statistic from the institute of museum and library services’ database of 9,247 public libraries (this number treats library systems as one).25 the libraries were chosen based on two criteria: libraries needed to be in states having a sizable number of public libraries and to have at least 3 million yearly in-person visitors. this resulted in a list of public libraries compiled to ensure that all notable libraries in the united states were covered in this study. the sample of 151 libraries’ state and population served, total circulation, total number of programs, total visits, and other associated data were entered into an excel spreadsheet for analysis. with a low of 168,661 and a high of 16,686,945, the average number of visitors per year (prior to the pandemic) to these libraries is 2,663,292. these libraries serve an average of 700,924 people, with a low of 8,542 and a high of 4,294,460. the dataset includes all branch libraries, with the largest system having 92 branches. mobile website/app visits, content analysis, and email surveys were among the study methodologies used. the mobile website examination was conducted through an android mobile phone device, using a specially designed codebook/spreadsheet to store and analyze data collected and verified from the library websites through 2021. the services offered were checked on each library’s website. email surveys were developed as a supplement to ensure data accuracy and additional input of the library’s mobile services from the librarians who work there. irb approval was granted (protocol #406)26 because human participants did not provide personal information and simply responded to email surveys. the first questionnaire was sent out to each library via google forms, either through direct email addresses or the library’s web form, from april 3 through april 7, 2021. the follow-up email survey was designed to act as a check for some possible limitations (such as being unable to access app features) and was sent on april 19–20, 2021. the data was verified again through the imls database in early 2022. the overarching question of what services libraries provided to users via their mobile devices was designed to delve into the mobile websites and/or apps provided by libraries, as well as the resources provided by the mobile websites or apps, reference services, reservation services, remote printing, and other services provided to mobile users. results and findings examining the use of a mobile device to deliver services by these most-visited public libraries reveals intriguing and novel findings in comparison to the use of desktop access for services available to users. the poll sought to find the following information: which mobile services the libraries now provide, what they intend to provide, and any feedback received on their mobile services. information technology and libraries march 2023 services to mobile users 5 liu and lewis all library websites are accessible via mobile devices every one of the 151 libraries analyzed had a mobile website. in certain situations, this was the library’s website optimized and tailor-made with responsive designs for a mobile device. in other cases, all that was provided was a version of the library’s website (but not optimized, so navigation was more difficult). in other cases, the library’s website was merged with the parent organization and simply provided basic library information. examples of these three conditions of the library mobile websites are shown in figure 1. figure 1. examples of different library websites. library services were made available to mobile devices primarily due to covid-19, as indicated by the responses of 60 (40%) of the 151 libraries to online questionnaires using google forms. though it is practically difficult to compare the services offered in 2019 to those offered during the covid-19 pandemic, 38 (63%) of respondents indicated that they had added mobile services in the last year (see fig. 2). information technology and libraries march 2023 services to mobile users 6 liu and lewis figure 2. mobile services added because of covid-19. what services were added, particularly since covid-19? chat features were the most popular response as a reference service and a means to connect. one example is the usage of discord, a chat program that allows users to communicate by voice, video, and text, as a virtual communication tool to provide reference services. nearly every library implemented curbside pickup. during the pandemic, virtual events and online reference services also became popular additions. book delivery was still one of the mobile services that libraries provided, either using “chomp delivery” (a local delivery service company used by one library in iowa) or the united states postal service (usps). while these “ship to patron” delivery approaches may not be a frequent practice, it was an inventive approach to getting library books into the hands of clients who were unable to leave their homes. see figure 3 for common services added during covid-19. over half of libraries provide dedicated apps for mobile devices though all libraries had a website, more than half had a specific library app dedicated for users with mobile devices. out of the 151 libraries analyzed, 52 percent (78) had at least one dedicated app built for the library or library system (see fig. 4). these apps could be downloaded from the google play store (android platform) or the apple store (ios platform). all but one of the 78 libraries allowed patrons to log in to their accounts to look at their current checkouts, search the catalog, place holds, and request items. other library applications such as hoopla or libby/overdrive, as well as apps that were used to display upcoming events and other library data, like locations and hours, were excluded. 63% 37% yes no information technology and libraries march 2023 services to mobile users 7 liu and lewis figure 3. common features added because of covid-19. figure 4. percent of libraries that offer a mobile app. 5% 8% 11% 14% 14% 19% 19% 24% 0% 5% 10% 15% 20% 25% 30% book delivery self-checkout digital library card mobile printing virtual reference services curbside virtual events chat 48% 52% no yes information technology and libraries march 2023 services to mobile users 8 liu and lewis services delivered through library mobile websites all 151 libraries had some form of mobile website. although each website provided a lot of the same information, there were significant variances (see fig. 5). the ability to log in to one’s account was available on the investigated websites. this allows customers to search the library catalog, place holds on books, and renew any books that were currently checked out. all the websites included basic contact information as well as information about the locations and hours of the libraries, and all libraries have an online public access catalog (opac) accessible through mobile devices. web-mediated services have been in place particularly since covid-19. nearly every library now offers services such as curbside pickup for books and other materials. library events are now almost exclusively offered virtually. over 98 percent of libraries have a calendar or other means of informing users about forthcoming virtual activities and events. some libraries are now making available re-opening plans, covid-19 protocols, and even covid-19 vaccine information. almost every library’s main webpage mentions modifications brought on by covid-19. as a result, certain library hours have altered, and services at all libraries have been impacted by covid-19. typically, getting a library card was a service that physically took place in the library. during covid-19, libraries had to learn to be flexible. over 94 percent of libraries offered the ability to get a library card or allowed patrons to register for an e-card so they could check out books. some of these libraries offered the option of printing the paperwork and necessary documentation to obtain a library card, but this paperwork did have to be dropped off at the library. however, many allowed their patrons to either extend the expiration date on a current library card or apply for a new one to be able to use the library’s virtual services. many libraries require a library card to access some of their services, such as databases. nearly all libraries offered a variety of databases or other apps (such as libby/overdrive or hoopla). a library card is needed to access all services. a couple of libraries even have their databases behind a login screen, so a user cannot even see the list of available databases until they have a library card and log in. all libraries in the sample had a social media presence. discrepancies on library websites were noted concerning new arrivals and recommendations. only about 55 percent of library websites listed new arrivals and about 52 percent listed recommendations. it is possible that this data is included within the library catalog (and only accessible once a user has logged in). out of all libraries analyzed, 43 libraries (28%) offered both recommendations and listed new arrivals. a smaller number, 31 libraries (21%) did not offer either of these. additional services seen on some of the library websites included the option to recommend a purchase to the library, how and if libraries were accepting donations, and a list of online class es being offered. some libraries offered services on finding a job or becoming a united states citizen. overall, the library websites provided a lot of valuable information to patrons. some were easier to navigate than others. the locations, hours, and contact information should be easy to find and access; however, on quite a few websites, these points of information are not easily accessed, since some libraries’ websites required logging in. still, with some searching and scrolling, a user could get to nearly all the basic information they might be looking for. information technology and libraries march 2023 services to mobile users 9 liu and lewis figure 5. percentage of libraries in study that offered services via the library’s mobile website. services delivered through library mobile apps while the primary function of a library’s dedicated app seemed to be accessing the catalog and requesting books, common services delivered through apps also include listings of events, such as webinars, found in 35 libraries (45%) that have an app; recent arrivals or recommendations, found in 33 libraries (42%); and ways to contact the library, in 13 libraries (17%). (see fig. 6 for the complete list of services.) all percentages are calculated based on the number of libraries that had a dedicated app that had that feature (78). gathering the information available on mobile apps was challenging as some apps require users to be affiliated with that library and log in with a library card. the information on what the app provides primarily came from the screenshots in the google play store (for the investigators’ android mobile devices) and the description on either the library’s website or within the play store. the apple app store applications available for library use are not disclosed in this study. as a result, the number of apps that offer additional services (such as locations/hours, events, etc.) may be higher than the percentages in figure 6. for example, the brooklyn public library app (seen in fig. 8) has many features that are not shown in the sample on the google play store. clicking on library info will provide additional details beyond library locations and hours, but also contact information and links to social media. this information is only displayed to affiliated library patrons who have downloaded and logged in to the app. 52% 56% 95% 98% 99% 100% 100% 100% 100% 100% 100% 0% 20% 40% 60% 80% 100% 120% recommendations recent arrivals get a library card/ecard virtual programs library database search social media locations and hours library contact information library catalog search ability to log in to account ability to reserve/renew materials information technology and libraries march 2023 services to mobile users 10 liu and lewis figure 6. percentage of libraries with a dedicated mobile app that offered services. apps developed and delivered for dedicated library services while it may appear that apps are the most popular method of connecting with patrons, maintaining an app and ensuring that it works on all mobile platforms and devices can be challenging. while creating and making an app available through the google play or apple app stores may be an easy procedure for certain libraries, others are impeded by a lack of technical expertise. designing and maintaining an app takes too much time and money, as voiced from the study survey. an in-depth examination of the google play store reveals that variables influencing the makers of each library’s app include familiarity with the community serviced, the potential for options, staff training, and phone access (see fig. 7). however, the majority of apps (67%) were commercially made, and only 33 percent were self-developed. this results in apps that are unique to the library and the user’s needs, but also require dedicated it staff to maintain the app. 17% 17% 42% 45% 63% 99% 99% 99% 0 0.2 0.4 0.6 0.8 1 1.2 social media contact us/ask a question recent arrivals/recommendations events locations/hours book reservations account login catalog search information technology and libraries march 2023 services to mobile users 11 liu and lewis figure 7. app developers, according to the google play store. one of the most popular software developers that provide libraries with scalable, effective mobile applications is solus uk ltd. out of the 78 apps analyzed, 24 percent used this developer. feedback from the study survey indicated that this app has strong capacity to expand or change according to user requirements, and its interface is user friendly and simple (see fig. 8). the app was obviously updated during the past year to allow for contactless holds pickup, a service that many libraries are offering so patrons do not have to come into the library to pick up their books or other materials. however, that interface is markedly different than others. both the chicago public library and the brooklyn public library have self-developed apps (see fig. 9). the functionality of these apps is like that of the st. paul public library, but the look is very different. other apps, developed by other companies, also have an entirely different presentation and notion, such as responding time and user interface. the main purpose is the same: allow the user to be able to view the catalog and be able to check their holds and current materials. 5% 6% 6% 8% 17% 24% 33% 0% 5% 10% 15% 20% 25% 30% 35% boopsie other bibliocommons capira communico solus uk ltd self-developed information technology and libraries march 2023 services to mobile users 12 liu and lewis figure 8. st. paul public library’s app developed by solus uk ltd. figure 9. self-developed apps: chicago public library app (left) and brooklyn public library app (right). information technology and libraries march 2023 services to mobile users 13 liu and lewis major forms of mobile reference services one of the most important ways for a library to connect with patrons is through mobile reference services. even when the library is not open, many people seek help from the library reference desk. while calling the library is always an option, it is often not the most convenient one. while not all mobile reference services will work in this instance, some certainly will. an increasingly common example is the use of chatbots to offer such services. out of the 151 libraries surveyed, 134 libraries (80%) offered mobile reference services in some format via both mobile websites and dedicated apps (see fig. 10). figure 10. percentage of libraries offering mobile reference services. mobile reference services are described in this study as a direct way to contact the library via its mobile site. this can be done via chat, which functions similarly to instant messaging, text messaging (a patron can text a reference request to a specified text message number), or a web form (mobile friendly and reachable from remote devices) (see fig. 11). the web form, which was found on the websites of 127 of the 134 libraries (95%), was the most often utilized channel for mobile reference transactions compared to other services (see fig. 11). the user’s name, email address (and sometimes their library card or branch location), as well as their inquiry, were required. users’ phone numbers can be blank and are not collected. the librarian would then answer through email (or phone, if provided and applicable). the web form is available seven days a week, 24 hours a day. the user can submit at any time and receive a response when the librarian is available, which is convenient for both parties. 11% 89% no yes information technology and libraries march 2023 services to mobile users 14 liu and lewis figure 11. major forms of mobile reference service. chat or instant messaging is the second most common type of mobile reference service and was used by 74 (55%) of the 134 libraries that offered mobile reference services. chat rooms were set up in several libraries during specific hours of the day. for example, from 11 a.m. to 1 p.m., the boston public library (massachusetts) hosts a chat session. outside of those hours, this feature is not available. the limitation of the chat option for some libraries is that it works on a computer but not on a mobile device. also, the chat feature was unavailable outside of libraries’ standard operating hours. the sms text option was chosen by 36 (27%) of the libraries. the fact that many questions require a lengthy response or back-and-forth conversation can sometimes make this more challenging for librarians. if a patron has a short inquiry, the text option is convenient; nevertheless, this is most often used when the library is open. in addition to the text function, all 36 libraries offered another form of mobile reference service. thirty-five libraries offered the web form in addition to the text alternative. while most libraries provide only one form of mobile reference service, a few provide three or more such options. out of 134 libraries that provided mobile reference services, 56 (42%) provided only one service and 49 (37%) provided two. only 30 libraries (22%) provided all three services (chat, sms, and web form) (see fig. 12). the provo city library (utah) combines all three services in one chat box in the example shown in figure 13. this allows a user to ask a question and then continue the conversation using the technique of their choice. presently, these tools are more frequently observed in public libraries using libraryh3lp as opposed to mosio or libanswers, etc. 27% 55% 95% 0% 20% 40% 60% 80% 100% text/sms chat/im web form information technology and libraries march 2023 services to mobile users 15 liu and lewis figure 12. comparison of the types of mobile reference services offered by libraries. figure 13. “ask a librarian” combines options of mobile services. mobile reservation services are widely available reserving a computer, museum pass, study room, meeting room, and show space are among the mobile reservation services that were examined. at least one of these was available at 106 libraries (70%) (see fig. 14). this finding indicated that mobile reservation services became widely used as a result of the covid-19 pandemic in the surveyed libraries. 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% one form two forms three forms information technology and libraries march 2023 services to mobile users 16 liu and lewis figure 14. percentage of libraries offering at least one mobile reservation service. many libraries remained closed throughout the survey study period, making it difficult for patrons to book conference rooms, study rooms, or exhibit space. some libraries that were open had meeting rooms and study rooms that were not available for use due to social distancing or other municipal standards. however, the option to reserve these rooms was still available on the website at the time of the analysis. some libraries supplied meeting space information but required patrons to make a reservation by contacting them. some libraries, such as the salt lake city public library in utah, closed their actual meeting rooms but offered rental of virtual meeting rooms while their physical meeting rooms were closed to the public (see fig. 15). others offer not just rooms, but outdoor and large areas. the indianapolis public library has an auditorium, an atrium, and a garden that are available to rent for a meeting—or even a wedding (the online form can be filled out requesting the date and the space for all types of events). 70% 30% yes no information technology and libraries march 2023 services to mobile users 17 liu and lewis figure 15. the salt lake city public library’s online announcement. while many events can take place at a library, the spaces or rooms that could be reserviced, such as wedding and exhibit spaces or rooms, are not commonly offered (see fig. 16). rather, the rooms at most libraries are used for either meetings or study sessions. the most common mobile reservation is for a meeting room, as 79 libraries (75%) out of the 106 libraries offering mobile reservation services allow patrons to reserve this space online (although, again, this service may not currently be available because of covid-19). study rooms are less commonly offered, with only 24 libraries (23%) offering mobile reservations for those online. however, some study rooms may be included with meeting rooms at some libraries. because libraries are closed or have covid-19 restrictions, book-a-librarian and reserve computer services are also paused. some libraries have pivoted to booking a librarian for a virtual meeting (34, or 32%), but often, that is just done through reference services. many libraries are limiting computer use and, as such, booking one can only be done in person to limit the number of people on the computers at one time. only 10 libraries (9%) of the libraries offering mobile reservations provide a service where a patron can reserve a computer. libraries are typically a great resource for free or discounted museum passes, but with some museums closed or having limited hours, some museum passes may not be available either. only 36 (34%) of the 106 libraries offering mobile reservation services allow patrons to book a museum pass online. information technology and libraries march 2023 services to mobile users 18 liu and lewis figure 16. percentage of libraries offering reservation of various kinds of services via a mobile interface mobile printing services become an emerging phenomenon the ability to print from a laptop computer or a mobile device is a newer service that many libraries are starting to offer to their patrons. with libraries being closed because of covid -19, the mobile printing service has become even more important. patrons can send a printing job to the library and then pick it up curbside. this service is another way libraries are adapting to the world during a pandemic. most libraries (114, or 76%) do offer mobile printing (see fig. 17). the majority of these libraries have their users download a specific app that allows them to connect to the library’s printers remotely with a variety of instructions. some libraries offer the ability to print wirelessly, where a user can connect to the library’s wi-fi (even in the parking lot if the library is closed) and send a document to the library that way. in this analysis, wireless printing and mobile printing are included together, as it is difficult to differentiate them because of the pandemic. as part of a remote service, some libraries are also starting to offer 3-d printing services, allowing patrons to submit 3-d print jobs from their mobile devices. this typically is also done through a specific app and sometimes comes with a fee allowing users to 3-d print and then pick the product up curbside. some libraries include this service as part of their makerspace and make it available for free with a library card. overall, 17 libraries (11%) offer 3-d printing. as more locations are utilizing 3-d printing technology, libraries can offer such 3-d service libraries to the general public wisely. 9% 23% 32% 34% 75% 0% 10% 20% 30% 40% 50% 60% 70% 80% computer study room librarian museum pass meeting room information technology and libraries march 2023 services to mobile users 19 liu and lewis figure 17. percentage of libraries that offer mobile (or wireless) printing. table 1. databases & applications accessible via a mobile website app/database percent of libraries providing app/database percent of libraries providing abcmouse 18% learning express 65% ancestry 78% lynda.com (linkedin) 57% bookflix 27% mango languages 57% brainfuse 32% masterfile 40% britannica 39% morningstar 57% chilton 35% national geographic 15% consumer reports 49% new york times 58% driving-tests.org 29% novelist 81% ebscohost 51% overdrive/libby 62% eric 31% proquest 17% explora 46% rbdigital/zinio 27% flipster 39% reference usa 57% freegal 33% rosetta stone 15% gale 72% tumble book 47% heritage quest 55% tutor.com 20% hoopla 48% valuline 56% kanopy 41% world book 33% khan 10% worldcat 44% 75% 25% yes no information technology and libraries march 2023 services to mobile users 20 liu and lewis subscribed databases are available via mobile devices patrons should be able to access databases via their mobile sites. when consumers think of public library services, databases are not always the first thing that comes to mind. however, when users were at home and libraries were closed, it was critical for patrons to have access to these databases. a library card was necessary to access the majority, if not all, of the databases. some libraries locked their databases behind a login screen so users can’t access them without a valid library card. a wide range of databases were available, and some libraries seemed to cater to their target audiences, with some offering more databases geared at children or teens (abc mouse, tumblebooks) and others focusing on their adult patrons (consumer reports, valuline). ancestry, gale (which includes academic onefile), and learning express (also known as prepstep) were the most widely used databases and were mentioned by 108 libraries (72%) of the 150 libraries that have databases available through their library mobile websites. table 1 lists the most commonly offered databases. many others were available at various libraries across the country, but none were offered by more than 10 percent of the libraries examined. social media bridges mobile devices from libraries to patrons all the libraries surveyed have social media links on their mobile websites. some libraries include apps that connect to social networking. every library had a facebook page, and almost all had a youtube channel (96%) or an instagram account (95%). the amount of access to some of these pages varies. one expectation from users is that all libraries would promote their social media on their main (or contact us) page, but some did not do so. figures were double-checked by browsing youtube, instagram, and twitter for each library. a youtube channel with fewer than 20 subscribers and postings was found for several libraries. ninety-three percent of libraries use their twitter accounts to promote their programming and other activities. following the listing of the primary social media, some libraries have a “drop-off” for how they interact with their users (see fig. 18). only 48 libraries (32%) used pinterest, and fewer used flickr (27, or 18%) or linkedin (25, or 17%). a very small percentage used tumblr (10, or 7%) or goodreads (6, or 4%). it is possible that more of these libraries used these social media channels but did not include the logo on their web page with their other social media. an extensive check was done on youtube, twitter, and instagram to verify that the libraries had a social media presence on those channels. these are the primary ways libraries connect users to their services, as they are the more popular on social media. other social media used include blogs from each library, e-newsletters, podcasts, rss feeds, vimeo, and tiktok. actions and barriers to advance mobile services despite the fact that more individuals are visiting libraries again and the facilities are open, many users might be hesitant to do so. fifty-six percent of the 54 libraries responding to the survey questionnnarie said they would like to provide more mobile services in the future (see fig. 19). information technology and libraries march 2023 services to mobile users 21 liu and lewis figure 18. percentage of public libraries using various social media services. figure 19. percentage of libraries wishing to add mobile services. 4% 7% 17% 18% 32% 93% 95% 96% 100% 0% 20% 40% 60% 80% 100% 120% goodreads tumblr linkedin flickr pinterest twitter instagram youtube facebook 56% 44% yes no information technology and libraries march 2023 services to mobile users 22 liu and lewis when libraries were asked what services they were looking to add, the two most common responses were: an app and text services. both responses were given by seven (21%) out of the 33 libraries that responded. the sms texting service was not just mobile reference, but also text notifications for program updates or reminders or account notifications. the rest of the responses were spread out, in that three libraries (9%) said that they were looking to add chat features and two libraries (6%) were looking to add mobile printing. other answers to the question by a single library included bookmobile, mobile checkout, and virtual story time. many libraries aspire to expand their offering of mobile reference services. the number of libraries that employ chat and sms text messaging appears to be substantial in the upcoming year. considerations are being given to expanding mobile services at libraries (see fig. 20). a total of 34 libraries responded, with the most common considerations being getting to know the community and promoting/marketing the services (18% of the libraries responded for each). a popular mobile service does not imply it will work for patrons, according to the surveyed libraries. the advice from survey respondents was to really attempt to figure out what people want and then give it to them. figure 20. percentage of libraries giving consideration to expanding mobile services to various areas. the research showed that libraries frequently test services to see what works and test services frequently to make sure they are operating properly. one library urged other libraries to do the best they could with what they had. they recognized that their customers were frustrated by their inability to provide mobile services and urged that they push out whatever mobile offerings they could. they agreed that it would not be flawless, but users would prefer to communicate with library staff when encountering a bug-free app/website/service. staffing, technology, and money were by far the three most significant difficulties that libraries faced, based on the response from 36 libraries (see fig. 21). while expanding mobile services is something that these libraries want to accomplish, finding employees to manage them and 9% 12% 15% 18% 18% 0% 5% 10% 15% 20% mobile phone access train your staff provide options/try as many as possible know your community promotion/marketing information technology and libraries march 2023 services to mobile users 23 liu and lewis understand how to utilize them is difficult. furthermore, many upgrades necessitate funding, whether for staff training or the development of an app. figure 21. percentage of libraries facing resource challenges in offering mobile services. one of the top technical issues is ensuring that their mobile services are interoperable across several platforms. six libraries (17%) indicated that anything they add will operate on both android and apple platforms on mobile devices. another technological challenge is that any service the library introduces must be compatible with what they already have. when new apps or platforms are required, integrating them with existing technology can be tricky. conclusion and suggestions for further study after examining 151 public libraries across the united states, it is obvious that libraries are attempting to engage with their consumers via mobile services. all 151 libraries have either a mobile website or one that is mobile friendly. as such, the website, in all cases, contained contact information, library locations and hours, access to the library catalog, and the ability to log in and renew or reserve a book. during a period when many libraries were closed, these primary services were critical in connecting with the bulk of users. libraries adapted to the times by introducing curbside service, which allowed users to place holds on books or materials remotely using their mobile devices and then return to the library to pick up their holds without having to go inside. a bit more than half of the libraries polled had a specialized mobile app that performs some of the same functions as the mobile website. some libraries are considering developing an app to help their patrons in the future. many libraries have incorporated remote reference services, such as booking a librarian with an online reservation, sms notifications, and chat, in response to the pandemic. these services enable the library to communicate with its patrons when face-to-face interaction is not possible. 17% 17% 23% 0% 5% 10% 15% 20% 25% money technology staff information technology and libraries march 2023 services to mobile users 24 liu and lewis libraries are continually assessing their mobile services to determine what will work best for their users. mobile printing, chat, and an app are among the new features they’re introducing. according to a poll, 56 percent of libraries plan to continue to provide mobile services in the future. many businesses, including libraries, have had to reconsider and adapt their business models. libraries are relying on mobile services to maintain their relationships with their clients as the world continues to change because of covid-19 and the rise of mobile devices. endnotes 1 statcounter global stats, “mobile and tablet internet usage exceeds desktop for first time worldwide,” press release, november 1, 2016, https://gs.statcounter.com/press/mobile-andtabletinternet-usage-exceeds-desktop-for-first-time-worldwide. 2 pew research center, “mobile phone ownership over time,” mobile fact sheet, april 7, 2021, http://www.pewinternet.org/factsheet/mobile/ 3 ash turner, “how many smartphones are in the world?”, accessed march 2021, https://www.bankmycell.com/blog/how-many-phones-are-in-the-world. 4 yajun guo, zinan yang, zhishun yang, yan quan liu, arlene bielefield, and gregory tharp, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (july 24, 2020): 533–48, https://doi.org/10.1108/lht04-2020-0098. 5 turner, “how many smartphones.” 6 turner, “how many smartphones.” 7 petter bae brandtzaeg, “how mobile media impacts urban life: blending social cohesion and social distancing,” interactions 27, no. 6 (november–december 2020): 52–56, https://doi.org/10.1145/3424682. 8 yajun guo, zinan yang, yiming yuan, hulfang ma, and yan quan liu, “contactless services: a survey of the practices of large public libraries in china,” information technology and libraries 41, 2 (2022): 2–21, https://doi.org/10.6017/ital.v41i2.14141. 9 theophilus kwamena ocran, peter graham underwood, and paulina afful arthur, “strategies for successful implementation of mobile phone library services,” the journal of academic librarianship 46, no. 5 (2020): 102174, https://doi.org/10.1016/j.acalib.2020.102174. 10 li liu, xin su, umair akram, and muhammad abrar, “the user acceptance behavior to mobile digital libraries,” international journal of enterprise information systems (ijeis) 16, no. 2 (april–june 2020): 38–53, https://doi.org/10.4018/ijeis.2020040103. 11 liu et al., “the user acceptance behavior.” 12 s. lemire, stacy gilbert, stephanie graves, and tiana faultry-okonkwo, “the present and future of the library mobile experience,” library technology reports, 49, no. 6, (2017). https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide http://www.pewinternet.org/fact-sheet/mobile/ http://www.pewinternet.org/fact-sheet/mobile/ https://www.bankmycell.com/blog/how-many-phones-are-in-the-world https://doi.org/10.1108/lht-04-2020-0098 https://doi.org/10.1108/lht-04-2020-0098 https://doi.org/10.1145/3424682 https://doi.org/10.6017/ital.v41i2.14141 https://doi.org/10.1016/j.acalib.2020.102174 https://doi.org/10.4018/ijeis.2020040103 information technology and libraries march 2023 services to mobile users 25 liu and lewis 13 lemire et al., “the library mobile experience.” 14 lemire et al., “the library mobile experience.” 15 alan kerr and diane rasmussen pennington, “public library mobile apps in scotland: views from the local authorities and the public,” library hi tech 36, no. 2, (2018): 237–51, https://doi.org/10.1108/lht-05-2017-0091. 16 kerr and pennington, “public library mobile apps in scotland.” 17 aubrey harvey chaputula and stephen mutula, “ereadiness of public university libraries in malawi to use mobile phones in the provision of library and information services,” library hi tech 36, no. 2, (2018): 270–88, https://doi.org/10.1108/lht10-2017-0204. 18 min zhang, xuele shen, mingxing zhu, and jun yang, “which platform should i choose? factors influencing consumers’ channel transfer intention from web-based to mobile library service,” library hi tech 34, no. 1, (2016): 2–20, https://doi.org/10.1108/lht-06-2015-0065. 19 yan quan liu and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology & libraries 34, no. 2, (june 2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 20 liu and briggs, “a library in the palm of your hand.” 21 yajun guo, yan quan liu, and arlene bielefield, “the provision of mobile services in us urban libraries,” information technology and libraries 37, no. 2 (june 2018): 78–93, https://doi.org/10.6017/ital.v37i2.10170. 22 guo, liu, and bielefield, “the provision of mobile services.” 23 guo, liu, and bielefield, “the provision of mobile services.” 24 kitty pope, tom peters, lori bell, and skip burhans, “twenty-first century library musthaves: mobile library services,” searcher 18, no. 3 (april 2010): 44–47. 25 institute of museum and library services, “library search and compare,” https://www.imls.gov/search-compare/. 26 yan quan liu and sarah lewis, irb protocol 406, for “the use of mobile services in public libraries across the country,” the department of information and library science, southern connecticut state university (2021). https://doi.org/10.1108/lht-05-2017-0091 https://doi.org/10.1108/lht-10-2017-0204 https://doi.org/10.1108/lht-10-2017-0204 https://doi.org/10.1108/lht-06-2015-0065 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.6017/ital.v37i2.10170 https://www.imls.gov/search-compare/ abstract introduction literature review status of library services provided in public libraries mobile apps offered across various libraries library services for mobile users research design results and findings all library websites are accessible via mobile devices over half of libraries provide dedicated apps for mobile devices services delivered through library mobile websites services delivered through library mobile apps apps developed and delivered for dedicated library services major forms of mobile reference services mobile reservation services are widely available mobile printing services become an emerging phenomenon subscribed databases are available via mobile devices social media bridges mobile devices from libraries to patrons actions and barriers to advance mobile services conclusion and suggestions for further study endnotes microsoft word 14041 20211221 galley.docx public libraries leading the way how covid affected our python class at the worcester public library melody friedenthal information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14041 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. © 2021. in june 2020, ital published my account of how the worcester public library (ma) came to offer a class in python programming and how that class was organized. although readers may have read the article in the middle of our covid-year, i wrote it mostly in early january 2020, before libraries across the country closed in an effort to protect staff and patrons from the disease. from spring 2020 through april 2021, i taught intro to coding: python for beginners five more times. but, of course, these classes were not face-to-face. like virtually all other library, musical, political, religious, and cultural programming across the world, our python course was taught virtually. the public services team has one professional zoom account, which my colleagues and i share. how did going remote affect this class? it depends on whether your perspective is that of a student or that of the instructor. many of us have read how difficult it’s been for teachers to effectively reach their elementarythrough-collage-age students. i’ve had many of the same challenges, but since nearly all my students are adults and they all chose to take this class, i don’t need to grapple with fidgety kids or recess. on the other hand, there were few distractions in our computer lab, while covid-time students have to grapple with pets, children squabbling, or noise from a tv. i was teaching from my home office. at the library i have one monitor but at home i have two, which makes it easier for me to spread out my assorted documents. to “protect” my students from seeing my messy house, i used a virtual background, one chosen not to distract. however, the software which determines the borders of a human presenter isn’t perfect and there is sometimes a halo behind my head of the things behind me; this may be distracting itself. prior to covid, since we had twelve seats in the computer lab, we limited registration to fourteen, allowing for some no-shows (and we have two spare laptops, in case everyone showed up). a week prior to session one i would email the registrants, asking them to confirm their continued interest. if a student didn’t confirm, i’d give their seat to someone on the waitlist. while i was not prepared to make my class a mooc (massive open online courses) because i individually review homework and give lots of feedback, we did increase maximum registration to fifteen since the number of seats in the computer lab was no longer a limiting factor. and, as before, i ask for confirmation via email, but i also include in that email two links and an attached word doc. the document is an excerpt from cory doctorow’s novel little brother on the joys of coding. information technology and libraries december 2021 how covid affected our python class | friedenthal 2 the first embedded link leads to the free version of zoom. the second link is to the thonny website (https://thonny.org). thonny is a free ide (integrated development environment) where students can write and execute python code. we used thonny when i taught face-to-face, but the lab computers all had thonny installed, and were ready for students to use. now, i have to depend on the ability of students to download the software to their own computers. i ask students to do the two downloads ahead of session one. which brings us to two problems: the class was no longer accessible to students who live in a household without a computer and internet service. and, as i found out with one prospective student, it’s not accessible to patrons who don’t have administrative rights to their computer; that is, the ability to download new software. when a patron confirms their interest, i email them the course manual. it now contains about 93 pages. i told students they might choose to print it but doing so is up to individual preference. the advantage of having a digital copy is that students can search for keywords easily. the disadvantage is that the cost of printing the manual is shifted to the student and may be prohibitive for some. in session one, i acknowledged that it’s difficult to learn technical material via zoom, and i encouraged everyone to ask questions during class and to email me if they are stymied while working on the homework. i reiterated that invitation during every session. while teaching, i bounce back-and-forth between screen-sharing my thonny window and the manual, while trying to keep an eye on the little zoom windows showing my students. some students cannot or choose not to turn on their video. this is a problem for me, since i can’t readily determine who’s asking a question. moreover, it is helpful to associate a face with a name. and since i give out a certificate of completion to each student who does the homework and attends all sessions, i want to make sure the student is actually taking part. i’ve had students who sign in, leave their camera off, and then apparently leave (i call on students by name and sometimes the no-video ones never respond). offering the class online has advantages in snowy worcester. students can tune in from the comfort of their own homes, avoid the slick roads, bypassing paying for parking at the municipal lot next to our building or for a bus to downtown, or the discomfort of walking in a dark citycenter in the evening. another plus: as program organizers and program participants have discovered, with videoconferencing we are no longer limited geographically. i had registrants who live in pennsylvania and georgia. as always, students range from total beginners to experienced programmers-of-other-languages. i’ve thought about how i can give extra time to the former while not boring the latter. one thing i’ve done is to make some assignments optional and say, “if you want an extra challenge, give this a try….” i’ve slowed the class down a bit, leaving more time for coding during each session. if a student has difficulties, i invited them to share their screen. this pedagogical technique actually works better information technology and libraries december 2021 how covid affected our python class | friedenthal 3 via zoom than in-person, because we could all see that screen equally well. in the computer lab, only the student who sat at the same (2-person) desk could easily see what the other person had coded. another thing i’ve done is to ratchet down the formality of the class: i am chattier and demo fun games i’ve written, e.g., hangman, tic-tac-toe, rock-paper-scissors, and you sunk my battleship, for inspiration. i experimented with using the built-in zoom whiteboard but that wasn’t satisfactory, so i wrote supplementary notes as comments in thonny. parents were fearful their kids were not being intellectually challenged when schools were closed due to the pandemic, so maybe i shouldn’t have been surprised that the april 2021 class contained seven children. there would have been an eighth, but when i realized one registrant was just seven years old, i told his mother that, while she was the best judge of her son’s abilities, i discouraged him from taking the class. she decided to take it herself. figure 1. a word-cloud of our fall 2020 project outcome evaluations (includes other digital learning programs). at our sixth and final session i traditionally execute a program which draws colorful graphics, rather like spirograph. students were able to see each curve being drawn in a new window launched by the ide. but this window doesn’t exist until i executed the program. while we were information technology and libraries december 2021 how covid affected our python class | friedenthal 4 using zoom, when i attempted to share my screen, the students missed the first graphics, no matter how fast i was at screen-sharing. i made the execution “sleep” for a few seconds to give me time to switch screens before the graphics were drawn. a larger percentage of students earned the certificate of completion during the virtual classes than on average in the in-person pre-covid classes, perhaps 75% vs. 40%. for the in-person classes our communications officer printed the certificates on heavy paper adorned with the wpl logo; i signed each and handed them out during the final session. for our virtual classes, the certificates were digitally signed and then emailed; students could print them if they chose. this follow-up is being written during october 2021, and with a substantial percentage of massachusetts residents vaccinated for covid, the worcester public library is now back to offering many programs in-person, including python. the city of worcester requires mask use in all municipal buildings, and while some patrons don’t cooperate, i’ve told my students that anyone not wearing a mask properly will be asked to leave the computer lab. with so many people out of work due to the economic devastation wrought by covid, we were gratified to be able to offer a class that teaches in-demand skills, especially ones that can be applied in a work-from-home environment. 198 information technology and libraries | december 2011 yan hantutorial articles: one was to make a case for using the cloud;4 while the other provided more details of moving a library’s it infrastructure (ils, website, and digital library systems) to a cloud along with discussing motivation, results, and evaluation in three areas (quality and stability, impact on library services, and cost).5 on the cost discussion, mitchell mentioned the difficulty of calculating technology total cost of ownership (tco) and cited two papers suggesting minimal cost savings. mitchell suggested the same but did not provide detailed cost information. in comparison, this paper has a detailed breakdown cost analysis along with different services, such as web applications and storage. mirsa and mondal proposed a suitability index and a return on investment (roi) model by taking into consideration impacts and real value.6 their suitability index and roi model is well thought but consider using the cloud for every aspect of all it operations as a whole. as a result, a company using this model will have the final conclusion of a “suitable,” or “may or may not be,” or “not suitable.” however, modular it operations and services (e.g., e-mail and storage) can be evaluated individually because these services can be easily upgraded or changed with minimal impacts to customers. i/o intensive services and storage intensive services have different resource requirements and thus the same evaluation criteria may not give an accurate picture of costs and benefits. for example, storing digital preservation files for libraries is a one-time data intensive operation. giving the above different nature of it operations and services, cloud computing may be suitable for some it operations but not for others. healy suggested that many companies did not have a complete financial analysis by missing staff retraining and system management. he listed the following areas for tco: hardware, software, recurring licensing and maintenance, bandwidth, a starting point for locating information for research; (2) buyer, the library as a purchaser of resources; and (3) archive, the library as a repository of resources. the 2009 survey indicates a gradual decline in their perception of the importance of “gateway,” no change in “archive,” growth in “buyer,” and increased importance for two new roles: “teaching support” and “research support.”1 to meet customers’ needs in these roles, libraries are innovating services, including catalogs and home websites (as “gateway” services), repository and digital library programs (as “archive,” “teaching support,” and “research support” services), and interlibrary loan (as a “buyer” and “research support” services). these services rely on stable and effective it infrastructure to operate. in the past, the growing needs of these web applications increased it expenditures and work complexity. more web applications, more storage, and more it support staff are weaved into centralized on-site it infrastructure along with huge investments in physical servers, networks, and buildings. however, decreasing budgets in libraries have had huge impact on all aspects of library operations and staffing. web applications running on local, managed servers might not be effective in technology nor efficient in cost. web applications utilizing cloud computing can be much more effective and efficient in some cases. literature review there are a growing number of articles related to cloud computing in libraries. chudnov described his personal experience of using cloud services amazon ec2 and s3 in an informal tone, costing him 50 cents.2 jordan discussed oclc’s strategies of building its next generation of services in cloud and provided a clear view of oclc’s future directions for us.3 mitchell wrote two cloud computing: case studies and total costs of ownership this paper consists of four major sections: the first section is a literature review of cloud computing and a cost model. the next section focuses on detailed overviews of cloud computing and its levels of services: saas, paas, and iaas. major cloud computing providers are introduced, including amazon web services (aws), microsoft azure, and google app engine. finally, case studies of implementing web applications on iaas and paas using aws, linode and google appengine are demonstrated. justifications of running on an iaas provider (aws) and running on a paas provider (google appengine) are described. the last section discusses costs and technology analysis comparing cloud computing with local managed storage and servers. the total costs of ownership (tco) of an aws small instance are significantly lower, but the tco of a typical 10tb space in amazon s3 are significantly higher. since amazon offers lower storage pricing for huge amounts of data, the tco might be lower. readers should do their own analysis on the tcos. a 2009 study from ithaka suggested that faculty perceive three traditional functions of a library: (1) gateway, the library as yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson, arizona. selecting a web content management system for an academic library website | han 199cloud computing: case studies and total costs of ownership | han 199 fundamental computing resources so that they can deploy and run arbitrary software such as operating systems and applications.13 in this model, the providers only manage underlying physical cloud infrastructure (e.g. physical servers and network), and provides services via virtualization. the users have maximum control on the infrastructure as if they own underlying physical servers and network. leading providers of this model includes amazon, linode, rackspace, joyent, and ibm blue cloud. major cloud computing providers include amazon web services (aws), microsoft windows azure, and google appengine. aws is considered to be an iaas, paas, and saas provider, which offers a collection of multiple computing services through the internet, including a few well-known services such as amazon elastic compute cloud (ec2),14 amazon simple storage service (s3), and amazon simpledb. ec2 started as a public beta in 2006. it allows users to pay for computing resources as they use them. with scalable use of computing resources and attractive pricing models, ec2 is one of the biggest brand names in cloud computing. it offers different os options, including multiple linux distributions, opensolaris, and windows server. ec2 uses xen virtualization, each virtual machine is called an instance. an instance in ec2 has no persistent storage, and data stored will be lost if the instance is terminated. therefore it is typical to use ec2 along with amazon elastic block store (ebs) or s3, which provides persistent storage for ec2 instances. amazon claims that both ebs and s3 are highly available and reliable. a user can create, start, stop, and terminate server instances through multiple geographical locations for benefits of resource optimization and high availability. for example, a user can start an instance in northern virginia, a potential to transform the it industry and it services, shifting the way it infrastructure and hardware are designed, purchased, and managed. many experts have their own version of cloud computing, which was discussed before.9 the national institute of standards and technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configuration computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.”10 nist also gives its three service models layered based on computing infrastructure: ■■ software as a service (saas) allows users to use the cloud computing providers’ applications through a thin client interface such as a web browser.11 in the saas model, the cloud computing providers manage almost everything in the cloud infrastructure (e.g., physical servers, network, os, applications). it is directly targeted for general end users. the end users can directly run applications on the clouds and do not need install, upgrade, and backup applications and their work. typical saas products are google apps and salesforce sales crm. ■■ platform as a service (paas) allows users to deploy their own applications on the provider’s cloud infrastructure under the provider’s environment such as programming languages, libraries, and tools.12 in this model, the cloud computing providers manage everything except the application in the cloud infrastructure. paas is directly targeted for general software developers. they can develop, test, and run their codes on a paas platform. typical examples of this model includes google appengine, windows azure, and joyent. ■■ infrastructure as a service (iaas) allows users to manage processing, storage, networks, and other staffing allocation, monitoring, backup, failover, security audit and compliance, integration, training, and speed to implementation.7 the author published his first paper regarding cloud computing in 2010.8 since then, the author has implemented and has been managing multiple web applications and services using iaas and paas providers. several web applications of the university of arizona libraries (ual) have been migrated to the cloud. this paper focuses on enterprise-level applications and services, not individual-level cloud applications such as google docs. the purposes of this article are to ■■ define cloud computing and levels of services; ■■ introduce and compare major cloud computing providers; ■■ provide case studies of running two web applications (dspace and a home grown java application) utilizing cloud computing with justification; ■■ provide a comparison of tco of running web applications comparing a cloud computing provider with a local managed server; ■■ provide a comparison of tco of 10tb storage space comparing a cloud computing provider with local managed storage; and ■■ briefly discuss technology advantages of cloud computing. definition of cloud computing and levels of services cloud computing services and providers cloud computing is becoming popular in the it industry. over the past few years, the supply-and-demand of this new area has been seeing a huge increase of investment in infrastructure and has been drawing broader uses in the united states. the author believes that it has a 200 information technology and libraries | december 2011 16gb storage, 200gb transfer, and the cost is $19.95 per month.20 customers pay up front. open-source cloud computing software and private cloud cloud computing also goes to open source if any person or organization wants to set up their own clouds. eucalyptus is an open-source cloud computing system developed by the university of california at santa barbara. some of its eye-catching features include full compatibility with amazon ec2 public infrastructure and multiple hypervisors, which allows different virtual machines (e.g., xen, kvm, vsphere) to run on one platform.21 its open-source company, eucalyptus systems, provides technical supports to end users. building a cloud infrastructure on cloud(s) is also possible and might be desirable in certain situations. current linux distributions work with eucalyptus to provide private cloud services such ubuntu enterprise cloud and red hat’s deltacloud. some organizations have been setting up private clouds to utilize advantages of cloud computing. the azure allows non-windows applications to run on the platform. for example, apache web server can be run as a “worker role.”17 there also are a few small-to-medium size providers such as linode.18 table 1 lists major cloud computing providers. the cloud computing providers operate in two business models: variable (pay-for-your-usage) plans and fixed plans. variable plans allows customers to pay only for the resources actually consumed (e.g., instancehours, data transfer). aws offers a variable plan. google app engine works in a similar way. google app engine offers two interesting features: daily budgets and free quotas. a daily budget allows customers to control the amount of resources used every day. the free quota is currently set as 6.5 hours of cpu time per day, 1 gb data in and out per day, and 1gb of data storage.19 by the end of each month, customers receive a bill listing the number of running hours, the amount of storage used, the size of data transfers, and other add-on services. linode only offers a fixed plan. the charge is based on the amount of ram, data storage, and data transfer by assuming an instance is always running. for example, the smallest instance has 360mb ram, mirroring instance in ireland, and another mirroring instance in asia. amazon keeps increasing its offering by introducing new paas and saas services, such as simpledb, simple e-mail service, and e-commerce. google app engine is a paas provider offering a cloud platform for web applications in google’s data centers. it was released as a beta version in 2008 but is currently in a full service mode. appengine functions like a middle layer, which frees customers worrying about running oss, modules, and libraries. it currently supports python and java programming languages and related frameworks, and it is expected to support more languages in the future. google app engine uses bigtable with its gql (a sqllike language). bigtable15 is google’s proprietary database, used in multiple google applications such as google earth, google search, and app engine. the design of gql intentionally does not support “join” statement for multiple machine optimization.16 unlike aws, google appengine has a nice feature that allows customers a taste of the platform: it is free of charge up to a certain level of resource use. after that, fees are charged for additional cpu time, bandwidth and storage. windows azure also is a paas provider, which runs on microsoft data centers. it provides a new way to run applications and storing data in microsoft way. microsoft customers can install and run applications on microsoft cloud. customers are provided with two different instance types: web role instances and worker role instances. customers can use a “web role instance” to accept incoming http/https requests using asp.net, windows communication foundation (wcf) or another.net technology working with iis. a “worker role instance” is not associated with iis, but functions as a background job. the two instances can be combined to create desired web services. it is clear that windows table 1. list of major cloud computing providers cloud computing provider layer akamai paas, saas amazon web services iaas, paas, saas emc saas eucalyptus iaas open source software google paas(appengine), saas ibm paas, saas linode iaas microsoft paas (azure), saas rackspace iaas, paas, saas salesforce.com paas, saas vmware vcloud paas, iaas zoho saas selecting a web content management system for an academic library website | han 201cloud computing: case studies and total costs of ownership | han 201 the work of modification of sql-style code would have been significant. the author has a monthly bill of $40 using an aws small instance. case study 2: japanese gif holding library finder application the author helped the north american coordinating council on japanese library resources (ncc) to develop and maintain a web service to identify japanese global ill framework (gif) libraries to facilitate interlibrary loan (ill) service. the application was developed in java using j2ee framework, and run in typical java servlet container such as tomcat. the application was initially operated in a small, locally managed server, and was migrated to linode and google appengine in may 2010. cloud computing provider selection and implementation unlike case 1, the author tested and installed the application to aws, linode and google appengine. aws and linode are iaas providers which give users greater control over virtual nodes on their cloud infrastructure. google appengine might be a better choice when applications run on normal os environments, because system administration tasks can be completed by paas providers, saving users’ time and resources. as a paas provider, google maintains its infrastructure environment such as os, programming languages, and tools. installing the application in google appengine can go through an eclipse plug-in or through command lines. in this case, the gif application is a simple system written in java without any database transactions. therefore google app engine’s proprietary gql database is not a barrier. however, users should be aware that google appengine has other unique features. for example, cloud computing provider selection and implementation a typical dspace instance requires java and related libraries, j2ee environment, and postgresql as database backend. three cloud computing providers have been evaluated: aws, linode, and google appengine. two instances were successfully installed and configured in aws and linode after a few days of testing. building a dspace instance on the cloud is the same process as running it on local except that it is much quicker to build, restart, rebuild, and backup. for example, an initial os installation in a traditional server will take a few hours compared to doing the same task that takes a few minutes using an iaas provider. installation on the aws ec2 and linode is almost the same except creating a login and setting up security policies. to log on to aws, command line tools using an x.509 certificate using public/private key are by default. a generated keypair is required to ssh an instance and no password ssh option is provided. in addition, appropriate “security groups” are required to set up to enable network protocols. in this case, protocols such as ssh and http along with typical port number 80 and 8080 must be enabled. activities such as manage instances, creating images, and setup security policies can be set up through aws web interface (see figure 1). steps and commands of running regular operations can be found in the appendix. in linode, using “root” to log on is allowed. users do not need to set network and security policies, as protocols and ports are already open. in system administration practice, running applications without enforcing security policies does present security risks to applications and systems. linode allows users to set up security policies. the author decided not to proceed with installation in google appengine because of its proprietary database gql. if implemented in google appengine, private cloud eases concerns in the public cloud such as security of data, control of data, and legal issues. for example, an institution can build its own cloud infrastructure using eucalyptus (or ubuntu cloud) with its own computing resources or simply using amazon aws. the private cloud computing service becomes customizable cloud computing resources which can be configured and reconfigured as needed. why is this valuable? in traditional computing approaches, servers, storage, and networking equipment are purchased, configured, and then used without significant changes for three to five years until lives end. in this case, some planning must be scheduled ahead of time thinking of computing resource needs in three to five years. it is certain that additional resources (e.g., ram, hard disks, cpu) will be reserved for future needs and are currently wasted. the private cloud reduces concerns regarding security and data control. however, one must still buy, build, and manage the private cloud, increasing tco and reducing the cost benefit. case studies: applications on the cloud case study 1: dspace implementation and analysis many libraries are running their institutional repositories at locally managed servers. ual has been running its repositories since 2004 as one of the earliest dspace adapters. one of the dspace instances was tested on the cloud in january 2010 after comparing costs and supports. later the author chose to run a production dspace in aws starting march 2010. the repository (http://www.afghan data.org/) currently holds 1,800 titles of digitized unique afghan materials. since then, several content and system updates have been applied. 202 information technology and libraries | december 2011 a good case for calculating the tco.25 in cases below, readers should be aware that there are the following assumptions: ■■ software, training, licensing, and maintenance costs are the same by assuming using on the same software environment on the local managed infrastructure and on the cloud. ■■ monitoring costs are the same based on the fact that monitoring software has to be hosted somewhere. ■■ bandwidth and network costs ignored. ■■ security audit and compliance ignored by assuming all data are open. the author runs an instance of 100gb in aws and a monthly bill of this node is around $40. in comparison, if running a local managed server, a physical server would have been purchased. in our case, a comparison of tco shows that the cloud computing model has a significant 50 percent cost saving, assuming a server life expectancy is five years. analysis and discussions cost analysis running applications on the cloud gives many technical advantages and results in significant cost savings over running them on local managed servers. in this section, the author presents detailed cost comparisons between virtual managed nodes in the cloud computing and local managed storage and servers in the traditional model. cost saving and low barriers to launch web services using the cloud is significant when considering easy start-up, scalability, and flexibility. one of the biggest advantages of the cloud computing lies in its on-demand, allowing users to start applications with minimal cost. the current cost of starting an instance on aws is 0.03 per hour if reserved. above the clouds: a berkeley view of cloud computing cites a comparison: “it costs $2.56 to rent $2 worth of cpu” and “costs are $6.00 when purchasing vs. $1.20–$1.50 per month on s3.”24 clearly healy made currently google appengine only allows users to have their codes running in python and java; it uses its own database query language gql. this creates an extra step for developers who are willing to migrate existing codes to google and existing sql queries have to be rewritten. in addition, other limitations with google app engine include allowing only a subset of the jre standard edition and users are unable to create new threads.22 the cost of running the application on google app engine is great, because google app engine offers free of charge up to its free quota. google identified 90 percent of applications were hosted free.23 this is a great paas resource for small web applications. applications on the cloud since 2009, the author has been running multiple web applications and services on multiple iaas and paas providers and has been very happy regarding services and overall costs. the running applications and services are listed in table 2. figure 1. amazon aws management console selecting a web content management system for an academic library website | han 203cloud computing: case studies and total costs of ownership | han 203 ■❏ operation expense: $7,190– $10,690. ignoring downtime and failure expenses, insurance cost, technology training, and backup process. ■● system administrator cost: $3,500–$7,000 = 5 years x 1–2 percent time x (50,000 salary + 50000 x 40 percent benefits). 1–2 percent time is about 5–10 minutes per day assuming this administrator works at 8 hours per day 5 days per week at 100 percent capacity. ■■ space cost: $1,500. ■● space cost for a book in ual is $2.80 per year. a physical server is estimated to be $300 dollars per year for space. ■● electricity cost: $2,190. of a 1.0–1.2 ghz 2007 opteron or 2007 xeon processor.”26 ■■ the tco of a physical server comparable to an aws small instance for 5 years: $5,858–$7,608. ■❏ an aws small instance is roughly 50 percent of computing power of a server quoted. (the tco here is calculated as 50 percent of $11,715–$15,215). ■❏ hardware: $4,525. ■● $4,525 = $2,658 (server) + $1,125 (3-year support) + $1,125 x2 /3 (additional 2-year support). note: dell poweredge server: intel xeon e56302.53ghz with 5-year support for mission critical 6-hours repair (source: dell. com quoted on oct. 20, 2010). ■■ the tco of an aws small instance for 5 years: $2,750–$3,750. ■❏ hardware: $0. ■❏ operation expense: $2,750– $3,750 ■● system administrator cost: $0–$1,000?. by eliminating physical infrastructure, there is no need or minimal cost to manage a server. ■● $2,750 = $350 (aws initial subscription fee) + $40/ month x 12 months x 5 years. the instance’s capacity can be found on aws, and cpu power can be evaluated by using /proc/cpuinfo. amazon indicated that “one ec2 compute unit provides the equivalent cpu capacity table 2. some ual web applications and cloud computing service providers computing infrastructure functions applications computing environment instances service providers data storage data storage n/a linux / windows data storage using ebs or s3 aws access digital repository dspace j2ee, java, tomcat, postgresql, afghanistan digital collections aws linode content management system joomla linux, apache, php, mysql, afghanistan digital libraries aws linode website html html sonoran desert knowledge exchange aws linode integrated library system koha linux, apache, perl, mysql afghanistan higher education union catalog aws linode web applications home-grown j2ee web application j2ee, java, tomcat japanese gif (global interlibrary-loan) holding finder at linode at google app engine aws linode google app engine computing services monitoring nagios linux, perl internal application aws linode networked devices administration ssh, sftp linux n/a aws linode 204 information technology and libraries | december 2011 meet users’ needs at will. rebuilding nodes and creating imaging are also easier on the cloud. server failure resulting from hardware error can result in significant downtime. the ual has a few server failure in the past few years. last year a server’s raid hard drives failed. the time spent on ordering new hard disks, waiting for server company technician’s arrival, and finally rebuilding software environment (e.g., os, web servers, application servers, user and group privileges) took six or more hours, not to mention about stress rising among customers due to unavailability of services. mirroring servers could minimize service downtime, but the cost would be almost doubled. in comparison, in the cloud computing model, the author took a few snapshots using the aws web management interface. if a node fails, the author can launch an instance using the snapshot within a minute or two. factors such as software and hardware failure, natural disasters, network failure, and human errors are the main causes for system downtime. the cloud computing providers generally have multiple data centers in different regions. for instance, amazon s3 and google appengine are claimed to be highly available and highly reliable. both aws and google app engine offer automatic scaling and load balancing. the cloud computing providers have huge advantages in offering high availability to minimize hardware failure, natural disasters, network failure, and human errors, while the locally managed server and storage approach has to be invested a lot to reduce these risks. in 2009 and 2010 the university of arizona has experienced at least two network and server outages each lasting a few hours; one failure was because of human error and the other was because of a power failure from tucson electric power. when a power line was cut by accident, what can you do? in comparison, over the past two years minimal downtime from includes 12tb hard disks (about 10tb usable space after raid 5 configuration) with 5-year support, assuming 5-year life expectancy. ■❏ operation expense: $1,438– $2,138 per year. ■● system administrator cost: $700–$1,200. see above. ■● space cost: $300. see above. ■● electricity costs: $438 per year. see above. ■● network cost ignored. technology analysis there is no need to purchase a server; no need to initial a cloud node; no need to setup security policies; no need to install tomcat, java and j2ee environment; and no need to update software. compared to the traditional approach, paas eliminates upfront hardware and software investment, reduces time and work for setting up running environment, and removes hardware and software upgrade and maintenance tasks. iaas eliminates upfront hardware investment along with other technical advantages discussed below. the cloud computing model offers much better scalability over the traditional model due to its flexibility and lower cost. in our repository, the initial storage requirement is not significant, but can grow over time if more digital collections are added. in addition, the number of visits is not high, but can increase significantly later. an accurate estimate of both factors can be difficult. in the traditional model, a purchased server has preconfigured hardware with limited storage. upgrading storage and processing power can be costly and problematic. downtime will be certain during the upgrade process. in comparison, the cloud computing model provides an easy way to upgrade storage and processing power with no downtime if handling well. bigger storage and larger instances with high-memory or highcpu can be added or removed to ■■ electricity cost: $2,190 = 5 years x 365 days/year x 24 hours/day x 0.5 kilowatt / hour x $0.10/kilowatt. most libraries running digital library programs require big storage for preserving digitization files. the analysis below just illustrates a comparison of the tco of 10tb space. it shows that the tco of locally managed storage has lower costs than amazon s3’s storage tco. though the cloud computing model still have the advantage of on-demand, avoid big initial investment on equipment, the author believes that locally managed storage may be a better solution if planned well. since amazon s6 storage pricing decreases from $0.14/gb to $0.095/gb over 500tb, amazon s3’s tco might be lower if an organization has huge amounts of data. the author suggests readers should do their own analysis. ■■ the tco of 10tb in amazon s3 per year: $16,800. note: amazon s3 replicate data at least 3 times, assuming these preservation files do not need constant changes. otherwise, data transfer fees could be high. ■❏ operation expense: $16,800 per year. ■● $16,800 = $1,400/month x 12 months. (based on amazon s3 pricing of $0.14/gb per month) ■● network cost ignored. ■■ the tco of a 10tb physical storage per year: $11,212–$12,612. ■❏ to match reliability of amazon s3, local managed storage needs three copies of data: two in hard disk and one in tape. note: dell ax4–5i san storage: quoted on october 26, 2010. replicate data 3 times, including 2 copies in hard disks, one copy in tape. ignoring time value of money, 3 percent inflation per year based on cpi statistic data. ■❏ hardware: $4,168 per year. ■● $20,840 a san storage selecting a web content management system for an academic library website | han 205cloud computing: case studies and total costs of ownership | han 205 ’06), nov. 6–8, 2006, seattle, wash., h t t p s : / / w w w. u s e n i x . o r g / e v e n t s / o s d i 0 6 / t e c h / c h a n g / c h a n g _ h t m l / (accessed apr. 21, 2010). 16. google, “gql reference, 2010, http://code.google.com/appengine/ docs/python/datastore/gqlreference .html (accessed apr. 21, 2010); google developers, “campfire one: introducing google app engine (pt. 3),” 2010, http:// www.youtube.com/watch?v=og6ac7dnx8 (accessed apr. 21, 2010). 17. david chappell, “introducing windows azure,” 2009, http://download.microsoft.com/download/e/4/3/ e43bb484–3b52–4fa8-a9f9-ec60a32954bc/ azure_services_platform.pdf (accessed apr. 2, 2010). 18. linode, “linode—xen vps hosting,” 2010, http://www.linode.com/ (accessed apr. 7, 2010). 19. google, “quotas—google app engine,” 2010, http://code.google.com/ appengine/docs/quotas.html (accessed oct. 21, 2010). 20. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 21. nurmi daniel et al., “the eucalyptus open-source cloud-computing system,” in 9th ieee/acm international symposium on cluster computing and the grid, 2009, doi: 10.1109/ccgrid.2009.93. 22. google, “the jre white list— google app engine—google code,” 2010, http://code.google.com/appengine/ docs/java/jrewhitelist.html (accessed apr. 9, 2010); google, “the java servelet environment,” 2010, http://code.google .com/appengine/docs/java/runtime .html (accessed apr. 9, 2010). 23. google, “changing quotas to keep most apps serving free,” 2009, http:// googleappengine.blogspot.com/2009/ 06/changing-quotas-to-keep-most-apps .html (access oct. 21, 2010). 24. michael armbust et al., above the clouds: a berkeley view of cloud computing (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 2 8 . h t m l (accessed july 1, 2009). 25. amazon, “amazon ec2 pricing,” 2010, http://aws.amazon.com/ec2/pricing/ (accessed feb. 20, 2010). 26. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. case of 10tb storage. since amazon offers lower storage pricing for huge amounts of data, readers are recommended to do their own analysis on the tcos. references 1. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies, 2010, http://www.ithaka .org/ithaka-s-r/research/faculty-surveys -2000–2009/faculty-survey-2009 (accessed apr. 20, 2010). 2. daniel chudnov, “a view from the clouds,” computers in libraries 30, no. 3 (2010): 33–35. 3. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 4. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 5. erik mitchell, “using cloud services for library it infrastructure,” code4lib journal 9 (2010), http://journal .code4lib.org/articles/2510 (accessed feb 10, 2011). 6. subhas c. misra and arka mondal, “identification of a company’s suitability for the adoption of cloud computing and modelling its corresponding return on investment,” mathematical & computer modelling 53 (2011): 504–21, doi: 10.1016/j. mcm.2010.03.037. 7. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. 8. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. 9. ibid. 10. peter mell and tim grance, the nist definition of cloud computing, nist, http://csrc.nist.gov/groups/sns/cloud -computing/ (accessed oct. 21, 2010). 11. ibid. 12. ibid. 13. ibid. 14. amazon, amazon elastic compute cloud (amazon ec2), 2010, http://aws .amazon.com/ec2/ (accessed oct. 21, 2010). 15. fay chang et al., “bigtable: a distributed storage system for structure data,” in 7th symposium on operating systems design and implementation (osdi the cloud computing providers was reported. there are some issues when implementing cloud computing. above the clouds: a berkeley view of cloud computing discusses ten obstacles and related opportunities for cloud computing.27 all of these obstacles and opportunities are technical. the author’s first paper on this topic also discusses legal jurisdiction issues when considering cloud computing.28 users should be aware of these potential issues when making a decision of adopting the cloud. summary this paper starts with literature review of articles in cloud computing, some of them describing how libraries are incorporating and evaluating the cloud. the author introduces cloud computing definition, identifies three-level of services (saas, paas, and iaas), and provides an overview of major players such as amazon, microsoft, and google. open source cloud software and how private cloud helps are discussed. then he presents case studies using different cloud computing providers: case 1 of using an iaas provider amazon and case 2 of using a paas provider google. in case 1, the author justifies the implementation of dspace on aws. in case 2, the author discusses advantages and pitfalls of paas and demonstrates a small web application hosted in google appengine. detailed analysis of the tcos comparing aws with local managed storage and servers are presented. the analysis shows that the cloud computing has technical advantages and offers significant cost savings when serving web applications. shifting web applications to the cloud provides several technical advantages over locally managed servers. high availability, flexibility, and cost-effectiveness are some of the most important benefits. however, the locally managed storage is still an attractive solution in a typical 206 information technology and libraries | december 2011 (accessed july 1, 2009). 29. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 – 2 8 . h t m l 27. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 28. michael armbust et al., above the clouds: a berkeley view of cloud computing, appendix. running instances on amazon ec2 task 1: building a new dspace instance ■■ build a clean os: select an amazon machine image (ami) such as ubuntu 9.2 to get up and running in a minute or two. ■■ install required modules and packages: install java, tomcat, postgresql, and mail servers. ■■ configure security and network access on the node. ■■ install and configure dspace: install system and configure configuration files. task 2: reloading a new dspace instance ■■ create a snapshot of current node with the ebs if desired: use aws’s management tools to create a snapshot. ■■ register the snapshot using aws’s management tools and write down the snapshot id, specify the kernel and ramdisk. command: ec2-register: registers the ami specified in the manifest file and generate a new ami id (see amazon ec2 documentation) (example: ec2-register -s snap-12345 -a i386 -d “description of ami” -n “name-of-image” —kernel aki-12345 — ramdisk ari-12345 ■■ in the future, a new instance can be started from this snapshot image in less than a minute. command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instance ami-a553bfcc -k keypair2 -b /dev/sda1=snap-c3fcd5aa: 100:false) task 3: increasing storage size of current instance ■■ to create an instance with desired persistent storage (e.g., 100 gb) command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instances ami-54321 -k ec2-key1 -b /dev/sda1=snap-12345:100:false) ■■ if you boot up an instance based on one of these amis with the default volume size, once it’s started up you can do an online resize of the file system: command: resize2fs: ext2 file system resizer (example: resize2fs /dev/sda1) task 4: backup ■■ go to aws web interface and navigate to the “instances” panel. ■■ select our instance and then choose “create image (ebs ami).” ■■ this newly created ami will be a snapshot of our system in its current state. community-driven programming: offering coding and robotics classes in your library public libraries leading the way community-driven programming offering coding and robotics classes in your library mary carrier information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16619 mary carrier (mcarrier@mvls.info) is technology & growth specialist, mohawk valley library system. © 2023. abstract mary carrier serves as the technology & growth specialist for the four counties of the mohawk valley library system in schenectady, new york. prior to this position, mary dedicated over 15 years to teaching digital literacy and technology trends at the clifton park-halfmoon public library, a suburban public library that has over 40,000 registered patrons and 1,500 visitors per day. the community has a strong presence in youth and family programs and is a popular place for teens and children to learn, play, and create. in 2015, she began offering coding and stem classes to children and teens at the library and in the community as outreach programs. mary will share her expertise in technology programming for children and teens and the importance of planning, preparing, and testing curriculum for coding and robotics classes. introduction when you get unsolicited input from the community, it is worth the effort to investigate. rather than say “that will never work here,” take the time to listen. is this the voice of one or of many? at the clifton park-halfmoon public library located south of saratoga springs and north of albany, new york, our 55,000-square-foot library serves two communities with a combined population of 64,575 and hundreds of visitors from the greater capital region. as a digital services trainer my focus was geared toward teaching computer classes to older adults. my pace was slow, steady, and empathic as i translated this world of technology into digestible “bytes” for everyday use. i could easily relate to the foreign language these tools presented since i did not grow up with technology. this community of learners grew steadily between 2007 and 2014, particularly when more and more people were buying e-readers, tablets, and smartphones. i found my niche, i was making a difference, and i was comfortably coasting. i was fortunate to be in a community that used new technology and valued learning. after all, they had free access to a variety of computer classes and one-on-one assistance at their local library. the digital literacy foundation grew and was strong in the community after finding the support needed; however, over the years, i was getting antsy as i reached a plateau. my new challenge was passed on to me in a loose net, rather than on a silver platter. a cub scout leader approached me as his son was “aging out” of scouts, that transitional period between fifth and sixth grade. he recommended the library start a coding club, and he was willing to help. coding was an area i knew very little about as a technology discipline, and working with children would be a new challenge for me, but i was ready to jump right in. what started out as a few weeks of experiential learning, turned into seven years of coding and robotics classes for third mailto:mcarrier@mvls.info information technology and libraries june 2023 community-driven programming 2 carrier grade through high school students with the addition of programs for kindergarten through second graders in later years. the process to initiate a coding club started in 2015 with research and fact finding. i gathered information about computer science and coding, including the benefits of learning computer science at a young age, methods to teach these skills, and websites and resources to help with the development of the curriculum. with data in hand regarding resources, benefits, and age recommendations, my supervisor and i sat down with the parent and brainstormed the best way to approach offering a coding club that would be beneficial to the community and manageable from our perspective. we decided to offer a six-week series and called the club code crew. fifth through eighth graders were invited to register for this two-hour after-school program with a goal to create projects in scratch (https://scratch.mit.edu/), which uses block programming. each team of two to three students worked on a project to highlight the benefits of using the library. teams created interactive stories about finding a favorite book, checking out materials, and highlighting the library’s programs, such as story time. at the end of the six weeks, each team presented their project to an audience of family members and staff. the goal was to learn how to code in scratch, create a final product to promote the library, collaborate, and have fun. code crew was just the beginning. from there, our library offered a variety of series and one-time classes in scratch, scratch jr., python, css/html, and javascript. here’s what i learned along the way… planning your program set your goal to offer coding curriculum to children, teens, and/or adults. some of our main goals were to emphasize the importance of computer science, to allow children the opportunity to create versus consume computer games, and to promote collaboration and confidence. research what programs and resources are already available and where in the community. check with local schools from elementary to high school, continuing education programs, youth organizations such as the ymca, and summer camps in your local community. if there are other programs currently offered, who are the participants in the program? are there restrictions or limitations? for example, is coding offered only to accelerated students or available to all children? is it restrictive due to participation fees? this research will indicate if there is a local offering. a free program at the library is more inclusive. design based on what you investigated, would this work in your community? in your design, who is your audience, what skills will they need, and what is the recommended age based on your review? what is your capacity for a class, or class size? consider the number of computers or laptops and the number of students that is manageable based on the age of your audience. who will run the class? do you have staff that are interested and available, or do you need to outsource this program? how much do you need to budget? we were fortunate that the parent with the original idea had a programming background. he started as a volunteer and quickly became a contractor as we added more and more classes. https://scratch.mit.edu/ information technology and libraries june 2023 community-driven programming 3 carrier investigate program content reviewing coding websites, curriculum, or lesson plans can help with successful preparation. generally, these three websites are popular, well-known coding websites. more specifically, they are my top picks because they appeal to different learning styles: • scratch coding: https://scratch.mit.edu/ • code.org: https://code.org/ • cs google first: https://csfirst.withgoogle.com/s/en/home there are tutorials that include written instructions, video instruction, and exercises for different skill levels from beginner to advanced for students. these tutorials all include beginner block programming that use a drag and drop method to click blocks of code together. children are guided through colorful, themed tutorials in logical order for learning. by creating a user account, students can save all their projects and progression for another class or to continue at home. each website has different benefits. i started with scratch (https://scratch.mit.edu/) as it is the most widely used (created in 2007), has an unforgettable logo (scratch the cat), and provides helpful tutorials to use in class. after learning the basics of moving scratch, adding loops to repeat actions, and changing the backgrounds and sprites (characters), children are ready to make their first pong game. animating sprites to talk to each other and change scenes is the next fun basic element to try. participants love seeing the conversations develop and sharing their stories with each other. we used code.org (https://code.org/) to help promote the benefits of computer science by showing memorable videos in class that have celebrities, sports figures, and social media creators expressing their positive experience of learning to code. code.org promotes an hour of code challenge every year during computer science education week, which is the first week of december. kids love this website because it uses popular and current themes, such as characters from the movie frozen, star wars, and minecraft. google cs (computer science) first (https://csfirst.withgoogle.com/s/en/home) launched in 2013 and is a website used for classroom style collaboration. as an instructor, you can choose to use google classroom to encourage sharing, and collaboration among participants, and to review an individual’s progress. the exercises and videos are incredible and support learning that continues to build skills with exercises defined as beginner, intermediate, and advanced. it uses scratch block programming with easy-to-follow lessons to use with a class or for self-paced learning. design and test design the curriculum and lesson plan for the class. think project-based. plan an introductory exercise that builds to a project. do a test run. bring in a few volunteer students and run through the lesson plan or run a pilot program and adjust based on the results. consider skill level and attention span. how will you accommodate this? i learned to be prepared with additional projects to offer students who finish early. give them time to explore. ask students https://scratch.mit.edu/ https://code.org/ https://csfirst.withgoogle.com/s/en/home https://scratch.mit.edu/ https://code.org/ https://csfirst.withgoogle.com/s/en/home information technology and libraries june 2023 community-driven programming 4 carrier to help each other and stick to completing the project. the project should be fun and easy enough to get them excited about coding. assess and adjust coding is not for everyone. assess what went well and what you may want to change. don’t take it personally if students didn’t like it or don’t return. some children and teens feel overwhelmed when they feel their skill levels are below others. encourage them to watch the video tutorials and start with the basics, like google cs first step-by-step video tutorials. figure 1. google cs first lessons. figure 2. scratch for cs first. information technology and libraries june 2023 community-driven programming 5 carrier after introducing and running a coding class for a while, you will have a built-in following and enthusiasm in your community. a natural progression is to start working with robots. the coding that has been successfully run on a computer screen can now be applied to an object. children have learned the general concepts of block programming, which is directly applicable to the coding used to control robots. loops, conditionals, and various commands can be programmed to bring the robot to life. robots love to dance, do obstacle courses, race, and “dress up.” each of these robots are compatible with block programming such as scratch and object-oriented programming languages such as python. three types of robots are used for classes at the clifton park-halfmoon public library. figure 3. robots used for classes. finch robots we purchased the finch robots (https://www.birdbraintechnologies.com/) first because they were priced reasonably below $100. they came with a long cable that tethers to a laptop and each is coded through a free download scratch 2.0. newer finch models are available and use a blue tooth connection. there are video tutorials and lesson plans available on the bird brain technologies website. these can be used for grades k-12. dot & dash robots there are apps, challenge cards, and curricula available on the wonder workshop website (https://www.makewonder.com/). these robots are best for grades k-8. makeblock – mbot makeblock includes activities and lesson plans for pre-k-12 (https://education.makeblock.com/). one of the most important components of working with robots is to research what is needed to run the robot. typically, there is an app or download required. whether you use a laptop, chromebook, or tablet, planning and preparation is key. we purchased six kindle fire tablets to use with the mbots and found that the blockly app needed was not available as a download. i was able to run a hack to allow me to download the google play store app, but with a little more research, this could have been avoided. give yourself time before offering a program to test, test, test. and even then, you will adjust and learn along the way. finch robots dash & dot makeblock mbot https://www.birdbraintechnologies.com/ https://www.makewonder.com/ https://education.makeblock.com/ information technology and libraries june 2023 community-driven programming 6 carrier figure 4. teams working together to set up an obstacle course for the finch robot. information technology and libraries june 2023 community-driven programming 7 carrier practical tips • start with tutorials. • use a pilot group of kids/teens. • don’t be afraid. you don’t have to be an expert. • encourage the class to help each other and to present their projects. • utilize students as classroom helpers as they gain confidence. • continuously assess and adjust your plan. • growth takes time, keep building the program, and move forward as long as there is interest. the library community’s strong presence and support for youth and family programs made this programming endeavor successful. as interest in the program grew, i hired two part-time librarians and a parent as contractors. the community response allowed us to develop curricula for k-2, 3-5, and 6-adult. figure 5. chart shows the breakdown of attendance by programming topic. scratch was targeted for third to fifth graders; ready-set-code for kindergarten to second graders, and python programming for fifth graders to adult. total attendance for children and teens for coding and robotics programs: • 2016: 877 • 2017: 1,069 • 2018: 1,244 • 2019: 921 information technology and libraries june 2023 community-driven programming 8 carrier and this reach extended beyond the library walls. outreach was offered through afterschool enrichment programs and participation in our annual district-wide science and health discovery night, a showcase of science, technology, engineering, math, and health exhibits and interactive demos for all ages. volunteer exhibitors from a variety of companies and organizations participate and draw an audience of more than 3,000. we also allowed six out of sixteen of our finch robots to be borrowed by teachers and community members for at home learning and play. our success was worth the effort. there is nothing better than when you hear the squeals of delight, see the pride in the creator, and witness collaboration. abstract introduction planning your program design investigate program content design and test assess and adjust finch robots dot & dash robots makeblock – mbot practical tips do space’s virtual interview lab: using simple technology to serve the public in a time of crisis editorial board thoughts do space’s virtual interview lab using simple technology to serve the public in a time of crisis michael p. sauers information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13461 as we start to pull ourselves out of the covid-19 pandemic it’s time to start thinking about what changes we made at our libraries in response and decide which ones we should keep and which ones need to end along with the pandemic itself. do space in omaha, nebraska is the country’s first community technology library run by community information trust, a privately funded non-profit. since our opening in november 2015, we have been providing access to a variety of technologies like a computer lab, laptops, high-speed internet and more along with innovative educational programs on a range of technological topics for individuals and small businesses. membership and the vast majority of services are free and anyone is welcome to join. as a result of the pandemic, we were closed to the public for three months in early 2020 and reopened under limited services to the public in june 2020. both while we were closed and since, we have implemented many changes to our services and programming including limiting the number of available computers to support social distancing and moving our all-in-person educational programming to all-online programming to just name two of the bigger changes. but what i’d like to talk about is one new service we implemented that was simple, easy to set up, and has had a significant impact on a number of our members: do space’s virtual interview lab. when we reopened to the public with limited services in june 2020, one of the questions we asked ourselves was what new services could we provide in the circumstances and, under those same circumstances, what new service might the public need. we considered the reality of social distancing, and the fact that our meeting rooms could no longer be used for meetings with multiple members. then, although nebraska has historically had a low unemployment rate, we realized that many employers that had not yet moved to online interviews, covid pretty much forced many to do so. this was combined with the fact that covid only reinforced the already significant digital divide. someone needing to attend a job interview online could easily be lacking something as simple as a good quality webcam or microphone, or not have the bandwidth available to them at their home to successfully video conference. worst case, they may not have a computer at all. these are exactly the situations that do space was created for; to offer the public access to the hardware, software, and bandwidth that they need to become successful. with this in mind we decided to turn our small conference room into a virtual interview lab. the room already had a good-sized table, excellent available wifi, generally good lighting, and plain white walls, perfect as a simple background. previous users of this room would generally use a laptop to connect to our wifi and a large screen in the room. instead, for this setup we added a small micropc which we connected via an ethernet port to our gigabit fiber internet connection. michael sauers (msauers@travelinlibrarian.info) is technology manager, do space, and a member of the journal’s editorial board © 2021. mailto:msauers@travelinlibrarian.info information technology and libraries june 2021 do space’s virtual interview lab | sauers 2 to this pc we added a 27” monitor, 1080p webcam, a blue yeti microphone, and headphones. on the pc we installed every virtual meeting platform we could think of including zoom, adobe connect, microsoft teams, gotowebinar/meeting, and more, placing direct shortcuts to all of the programs and online services right on the desktop for easy access. with our setup complete we opened the lab for bookings starting july 1, 2020. use has been slow and steady, possibly due to our low unemployment rate, but the members that have used it are grateful for its existence. our marketing was first just on our website and social media but after a month or two we gathered a list of over 50 area groups and organizations that assisted folks with finding work and mailed them a stash of postcards that they could hand out and asked them to let us know if they needed any more. one group was so inspired by the project that in their thank you they said that they’d be starting their own virtual interview lab at their location. in the past year the lab hasn’t changed all that much with the exception of moving to a different room and a broadening of the use case. we quickly realized that members were wishing to use the room for events beyond job interviews. those using the lab have done so for attending ged and language classes, business meetings, attending do space’s own online programming, and even participating in our virtual tech mentoring program. have there been any problems? we’re dealing with technology here so of course the answer is yes, but luckily, they have been minor. for example, one person commented that our blue headphones didn’t look as “professional” as they would have liked. other times zoom needed a last-minute update which staff quickly addressed. (we encourage everyone to book the start of their session 30 minutes in advance to give us a chance to fix such issues.) otherwise, feedback has been overall very positive. here’s just a few examples: • “thank you! i actually used the room on short notice for several conference calls (plumbing disaster at my house!). it's not the intended use, but it was open and your team was kind enough to let me use it. i sincerely appreciate it. the room, by the way, has an excellent set up. wifi was lightning fast, lighting was perfect and i love that you have a microphone to focus the sound. oh, and that cute coat rack dressed up my background when i had to talk to a large client. it's fantastic that you offer this resource to the community. thanks again for letting me use it!” • “wanted to note a few things. i used this space for a research interview where i was a participant. i wasn't strictly using this space for a job interview. that said, it suited my needs perfectly. i was very happy to utilize this space. it was quiet, clean, and accommodated what i was looking for. customer service was also excellent. the service desk worker was able to promptly get me set up when i was already running a bit late for my interview. thank you for making this service available and also making it intuitive and easy to utilize. i will probably look to use it again in the future.” • “the room is ideal, quiet, no distractions, i was able to connect clearly using teams, no glitches, the volume was loud enough. was able to hear clearly and see interviewers faces clearly. staff at do space were available and prompt to assist before the interview when i needed set-up help adjusting the appearance / display of my head within the frame/screen.” • “you are a godsend! i am so grateful, especially in these times, that you are here. the staff are kind, patient and thoroughly knowledgeable. i love you.” information technology and libraries june 2021 do space’s virtual interview lab | sauers 3 this experience has reminded me that while all the advanced experimenting and complex coding we create to better assist our users is all well and good, sometimes just a simple computer, internet connection, and webcam can make all the difference in someone’s life. while some of the changes that we’ve made over the past year, such as moving all programming online, will be either ending or slowly transitioning to pre-pandemic states, our virtual interview lab is one new service that we will be definitely keeping for the foreseeable future. information retrieval using a middleware approach danijela boberić krstićev information technology and libraries | march 2013 54 abstract this paper explores the use of a mediator/wrapper approach to enable the search of an existing library management system using different information retrieval protocols. it proposes an architecture for a software component that will act as an intermediary between the library system and search services. it provides an overview of different approaches to add z39.50 and search/retrieval via url (sru) functionality using a middleware approach that is implemented on the bisis library management system. that wrapper performs transformation of contextual query language (cql) into lucene query language. the primary aim of this software component is to enable search and retrieval of bibliographic records using the sru and z39.50 protocols, but the proposed architecture of the software components is also suitable for inclusion of the existing library management system into a library portal. the software component provides a single interface to server-side protocols for search and retrieval of records. additional protocols could be used. this paper provides practical demonstration of interest to developers of library management systems and those who are trying to use open-source solutions to make their local catalog accessible to other systems. introduction information technologies are changing and developing very quickly, forcing continual adjustment of business processes to leverage the new trends. these changes affect all spheres of society, including libraries. there is a need to add new functionality to existing systems in ways that are cost effective and do not require major redevelopment of systems that have achieved a reasonable level of maturity and robustness. this paper describes how to extend an existing library management system with new functionality supporting easy sharing of bibliographic information with other library management systems. one of the core services of library management systems is support for shared cataloging. this service consists of the following activities: a librarian when processing a new bibliographical unit first checks whether the bibliographic unit has already been recorded in another library in the world. if it is found, then the librarian stores that electronic records to his/her local database of bibliographic records. in order to enable those activities, it is necessary that standard way of communication between different library management systems exists. currently, the well-known standards in this area are z39.501 and sru.2 danijela boberić krstićev (dboberic@uns.ac.rs) is a member department of mathematics and informatics, faculty of sciences, university of novi sad, serbia. mailto:dboberic@uns.ac.rs information retrieval using a middleware approach | krstićev 55 in this paper, a software component that integrates services for retrieval bibliographic records using the z39.50 and sru standard is described. the main purpose of that component is to encapsulate server sides of the appropriate protocols and to provide a unique interface for communication with the existing library management system. the same interface may be used regardless of which protocols are used for communication with the library management system. in addition, the software component acts as an intermediary between two different library management systems. the main advantage of the component is that it is independent of library management system with which it communicates. also, the component could be extended with new search and retrieval protocols. by using the component, the functionality of existing library management systems would be improved and redevelopment of the existing system would not be necessary. it means that the existing library management system would just need to provide an interface for communication with that component. that interface can even be implemented as an xml web service. standards used for search and retrieval the z39.50 standard was one of the first standards that defined a set of services to search for and retrieve data. the standard is an abstract model that defines communication between the client and server and does not go into details of implementation of the client or server. the model defines abstract prefixes used for search that do not depend on the implementation of the underlying system. it also defines the format in which data can be exchanged. the z39.50 standard defines query language type-1, which is required when implementing this standard. the z39.50 standard has certain drawbacks that new generation of standards, like sru, is trying to overcome. sru tries to keep functionality defined by z39.50 standard, but to allow its implementation using current technologies. one of the main advantages of the sru protocol, as opposed to z39.50, is that it allows messages to be exchanged in a form of xml documents, which was not the case with the z39.50 protocol. the query language used in sru is called contextual query language (cql).3 the sru standard has two implementations, one in which search and retrieval is done by sending messages via the hypertext transfer protocol (http) get and post methods (sru version) and the other for sending messages using the simple object access protocol (soap) (srw version). the main difference between sru and srw is in the way of sending messages.4 the srw version of the protocol packs messages in the soap envelope element, while the sru version of the protocol sends messages based on parameter/value pairs that are included in the url. another difference between the two versions is that the sru protocol for messages transfer uses only http, while srw, in can use secure shell (ssh) and simple mail transfer protocol (smtp), in addition to http. information technology and libraries | march 2013 56 related work a common approach for adding sru support to library systems, most of which already support, the z39.50 search protocol,5 has been to use existing software architecture that supports the z39.50 protocol. simultaneously supporting both protocols is very important because individual libraries will not decide to move to the new protocol until it is widely adopted within the library community. one approach in the implementation of a system for retrieval of data using both protocols is to create two independent server-side components for z39.50 and sru, where both software components access a single database. this approach involves creating a server implementation from the scratch without the utilization of existing architectures, which could be considered a disadvantage. figure 1. software architecture of a system with separate implementations of serverside protocols this approach is good if there is an existing z39.50 or sru server-side implementation, or if there is a library management system, for example, that supports just the z39.50 protocol, but has open programming code and allows changes that would allow the development of an sru service. the system architecture that is based on this approach is shown in figure 1 as a unified modeling language (uml) component diagram. in this figure, the software components that constitute the implementation of the client and the server side for each individual protocol are clearly separated, while the database is shared. the main disadvantage of this approach is that adding support for new search and retrieval protocols requires the transformation of the query language supported by that new protocol into the query language of target system. for example, if the existing library management system uses a relational database to store bibliographic records, for every a new protocol added, its query language must be transformed into the structured query language (sql) supported by the database. z39.50 server side sru server side database z39.50 client side sru client side zservice sruservice jdbc information retrieval using a middleware approach | krstićev 57 however, in most commercial library management systems that support server-side z39.50, local development and maintenance of additional services may not be possible due to the closed nature of the systems. one of the solutions in this case would be to create a so-called “gateway” software component that implements both an sru server and a z39.50 client, used to access the existing z39.50 server. that is, if a sru client's application sends search request, the gateway will accept that request, transform it into the z39.50 request and forward the request to the z39.50 server. similarly, when the gateway receives a response from the z39.50 server, the gateway will transform this response in sru response and forward it to the client. in this way, the client will have the impression that communicates directly with the sru server, while the existing z39.50 server will think that it sends response directly to the z39.50 client. figure 2 presents a component diagram that represents the architecture of the system that is based on this approach. figure 2. software architecture of a system with a gateway the software architecture shown in the figure 2 is one of the most common approaches and is used by the library of congress (lc),6 which uses the commercial voyager7 library information system, which allows searching by the z39.50 protocol. in order to support search of the lc database using sru, indexdata8 developed the yazproxy software component,9 which is an sruz39.50 gateway. the same idea10 was used in the implementation of the "the european library”11 database sru client side jdbc gateway sru server side z39.50 client side srutoz3950converter zservice z39.50 server side sruservice information technology and libraries | march 2013 58 portal, which aims to provide integrated access to the major collections of all the european national libraries. another interesting approach in designing software architecture for systems dealing with retrieval of information can be observed in the systems involved in searching heterogeneous information sources. the architecture of these systems is shown in figure 3. the basic idea in most of these systems is to provide the user with a single interface to search different systems. this means that there is a separate component that will accept a user query and transform it into a query that is supported by the specific system component that offers search and data retrieval. this component is also known as a mediator. a separate wrapper component must be created for each system to be searched, to convert the user's query to a query that is understood by the particular target system.12 figure 3. architecture with the mediator/wrapper approach figure 3 shows a system architecture that enables communication with three different systems (system1, system2 and systemn), each of which may use a different query language and therefore need different wrapper components (wrapper1, wrapper2 and wrappern ). in this architecture, each system can be a new mediator component that will interact with other systems. that is, the wrapper component can communicate with the system or with another mediator. the role of the mediator is to accept the request defined by the user and send it to all wrapper components. the wrapper components know how to transform the query that is sent by a mediator into a query that is supported by the target system with which the wrapper communicates. in addition, the wrapper has to transform data received from the target system in a format prescribed by the mediator. communication between client applications and the mediator client mediator system1 system2 systemn wrapper1 wrapper2 wrappern converter1 concrete query languagenconcrete query language2concrete query language1 converter2 convertern uniform query language information retrieval using a middleware approach | krstićev 59 may be through one of the protocols for search and retrieval of information, for example through the sru or z39.50 protocols, or it may be a standard http protocol. systems in which the architecture is based on the mediator/wrapper approach are described in several papers. coiera et al (2005)13 describe the architecture of a system that deals with the federated search of journals in the field of medicine, using the internal query language unified query language (uql). for each information source with which the system communicates, a wrapper was developed to translate queries from uql into the native query language of the source. the wrapper also has the task of returning search results to the mediator. those results are returned as an xml document, with a defined internal format called a unified response language (urel). as an alternative to using particular defined languages (uql and urel), a cql query language and the sru protocol could be used. another example of the use of mediators is described by cousins and sanders (2006),14 who address the interoperability issues in cross-database access and suggest how to incorporate a virtual union catalogue into the wider information environment through the application of middleware, using the z39.50 protocol to communicate with underlying sources. software component for services integration this paper describes a software component that would enable the integration of services for search and retrieval of bibliographic records into an existing library system. the main idea is that the component should be modular and flexible in order to allow the addition of new protocols for search and easy integration into the existing system. based on the papers analyzed in the previous section, it was concluded that a mediator/wrapper approach would work best. the architecture of system that would include the component and that would allow search and retrieval of bibliographic records from other library systems is shown in figure 4. z39.50 client sru client library information system recordmanager intermediary mediator wrapper z39.50 server sru server information technology and libraries | march 2013 60 figure 4. architecture of system for retrieval of bibliographic records in figure 4, the central place is occupied by the intermediary component, which consists of a mediator component and a wrapper component. this component is an intermediary between the search service and an existing library system. the library system provides an interface (recordmanager) which is responsible for returning records that match the received query. figure 4 also shows the components that are client applications that use specific protocols for communication (sru and z39.50), as well as the components that represent the server-side implementation of appropriate protocols. this paper will not describe the architecture of components that implement the server side of the z39.50 and sru protocols, primarily because there are already a lot of open-source solutions15 that implement those components and can easily be connected with this intermediary component. in order to test the intermediary component, we used the server side of the z39.50 protocol developed through the jafer project16 ; for the sru server side, we developed a special web service in the java programming language. in further discussion, it is assumed that the intermediary component receives queries from server-side z39.50 and sru services, and that this component does not contain any implementation of these protocols. the mediator component, which is part of the intermediary component, must accept queries sent by the server-side search and retrieval services. the mediator component uses its own internal representation of queries, so it is therefore necessary to transform received queries into the appropriate internal representation. after that, the mediator will establish communication with the wrapper component, which is in charge of executing queries in existing library system. the basic role of the wrapper component is to transform queries received from the mediator into queries supported by library system. after executing the query, the wrapper sends search results as an xml document to the mediator. before sending those results to server side of protocol, the mediator must transform those results into the format that was defined by the client. mediator software component the mediator is a software component that provides a unique interface for different client applications. in this study, as shown in figure 4, a slightly different solution was selected. instead of the mediator communicating directly with the client application, which in the case of protocols for data exchange is client side of that protocol, it actually communicates with the server components that implement the appropriate protocols, and the client application exchanges messages with the corresponding server-side protocol. the z39.50 client exchanges messages with the appropriate z39.50 server, and it communicates with the mediator component. a similar process is done when communication is done using the sru protocol. what is important to emphasize is that the z39.50 and sru servers communicate with the mediator through a unified user interface, represented in figure 5 by class mediatorservice. in this way the same method is used to submit the query and receive results, regardless of which protocol is used. that means information retrieval using a middleware approach | krstićev 61 that our system becomes more scalable and that it is possible to add some new search and retrieval protocols without refactoring the mediator component. figure 5 shows the uml class diagram that describes the software mediator component. the mediatorservice class is responsible for communication with the server-side z39.50 and sru protocols. this class accepts queries from the server side of protocols and returns bibliographic records in the format defined by the server. the mediator can accept queries defined by different query languages. its task is to transform these queries to an internal query language, which will be forwarded to the wrapper component. in this implementation, accepted queries are transformed into an object representation of cql, as defined by the sru standard. one of the reasons for choosing cql is that concepts defined in the z39.50 standard query language can be easily mapped to the corresponding concepts defined by cql. cql is semantically rich, so can be used to create various types of queries. also, because it is based on the concept of context set, it is extensible and allows usage of various types of context sets for different purposes. so, cql is not just limited to the function of searching bibliographic material. it could, for example, be used for searching geographical data. accordingly, it was assumed that cql is a general query language and that probably any query language could be transformed into it. in this implementation, the object model of cql query defined in project cqljava17 was used. in the case that there is a new query language, it would be necessary to perform mapping of the new query language into cql or to extend the object model of cql with new concepts. this implementation of the mediator component could transform two different types of queries into the cql object model. currently, it can transform type-1 queries (used by z39.50) and cql queries into cql object representation. to to add a new query language, it would just be necessary to add a new class that would implement the interface queryconverter shown in figure 5, but the architecture of component mediator remains the same. one task of the mediator component is to return records in the format that was defined by the client that sent the request. information technology and libraries | march 2013 62 figure 5. uml class diagram of mediator component as the mediator communicates with the z39.50 and sru server side, the task of the z39.50 and sru server side will be to check whether the format that the client requires is supported by the underlying system. if it is not supported, the request is not sent to mediator. otherwise, the mediator ensures the transformation of retrieved records into the chosen format. the mediator obtains bibliographic records from the wrapper in the form of an xml document that is valid according to the appropriate xml schema.18 the xml schema allows the creation of an xml document describing bibliographic records according to the unimarc19 or marc2120 format. the current implementation of the mediator component supports transformation of bibliographic records into an xml document that can be an instance of the unimarcslim xml schema,21 the marc21slim xml schema,22 or the dublin core xml schema.23 adding support for a new format would require creating a new class that would extend the class recordserializer (figure 5). because this mediator component works with xml, the transformation of bibliographic records into a new format also could be done by using exstensible stylesheet language transformations (xslt). 0..11..1 0..1 1..* 0..1 0..1 mediatorservice + getrecords (object query, string format) : string[] wrapper + executequery (cqlnode cqlquery) : string[] cqlstringconverter + parsequery (object query) : cqlnode rpnconverter + parsequery (object query) : cqlnode queryconverter + parsequery (object query) : cqlnode marc21serializer + serialize (string r) : sting dublincoreserializer + serialize (string r) : sting unimarcserializer + serialize (string r) : sting recordserialize + serialize (string r) : sting information retrieval using a middleware approach | krstićev 63 wrapper software component the wrapper software component is responsible for ensuring communication between the mediator and the existing library system. that is, the wrapper component is responsible for transforming the cql object representation into a concrete query that is supported by the existing library system and for obtaining results that match the query. implementation of the wrapper component directly depends on the architecture of the existing library system. figure 7 proposes a possible architecture of the wrapper component. this proposed architecture assumes that the existing library system provides some kind of service that will be used by the wrapper component to send the query and obtain results. the recordmanager interface in figure 7 is an example of such a service. recordmanager has two operations, one which executes the query and returns the number of hits and the second operation which returns bibliographic records. this proposed solution is useful for libraries that use a library management system that can be extended. it may not be appropriate for libraries using an “off the self” library management system that cannot be extended. the proposed architecture of the wrapper component is based on a strategy design pattern,24 primarily because of the need for transformation of the cql query into a query that is supported by the library system. according to the cql concept of context sets, all prefixes that can be searched are grouped in context sets, and these sets are registered with the library of congress. the concept of context sets enables specific communities and users to define their own prefixes, relations, and modifiers without fear that their name will be identical to the name of prefix defined in another set. that is, it is possible to define two prefixes with the same name, but they belong to different sets and therefore have different semantics. cql offers the possibility of combining in a single query elements that are defined in different context sets. when parsing a query, it is necessary to check which context set a particular item belongs to and then to apply appropriate mapping of the element from the context set to the corresponding element defined by the query language used in the library system. the strategy design pattern includes patterns that describe the behavior of objects (behavioral patterns), which determine the responsibility of each object and the way in which objects communicate with each other. the main task of a strategy pattern is to enable easy adjustment of the algorithm that is applied by an object at runtime. strategy pattern defines a family of algorithms, each of which is encapsulated in a single object. figure 6 is shows a class diagram from the book “design patterns: elements of reusable object-oriented software,“25 which describes basic elements of strategy patterns. information technology and libraries | march 2013 64 figure 6. strategy design pattern the basic elements of this pattern are the classes context, strategy, concretestrategya and concretestrategyb. the class context is in charge of choosing and changing algorithms in a way that creates an instance of the appropriate class, which implements the interface strategy. interface strategy contains the method algorityinterface(), which should implement all classes that implement that interface. class concretestrategya implements one concrete algorithm. this design pattern is used when transforming cql queries primarily because cql queries can consist of elements that belong to different context sets, whose elements are interpreted differently. classes context, strategy, cqlstrategy and dcstrategy, shown in figure 7, are elements of strategy pattern responsible for mapping concepts defined by cql. the class context is responsible for selection of appropriate strategies for parsing, depending on which context set the element that is going to be transformed belongs to. class cqlstrategy and dcstrategy are responsible for mapping the elements belonging respectively to the cql or dublin core context set in the appropriate elements of a particular query language used by the library system. the use of strategy pattern makes it possible, in real time, to change the algorithm that will parse the query depending on what context set is used. the described implementation of a wrapper component enables the parsing of queries that contain only elements that belong to cql and/or the dublin core context set. in order to provide support for a new context set, a new implementation of interface strategy (figure 7) would be required, including an algorithm to parse the elements defined by this new set. information retrieval using a middleware approach | krstićev 65 figure 7. uml class diagram of wrapper component integration of intermediary software components into the bisis library system the bisis library system was developed at the faculty of science and the faculty of technical sciences in novi sad, serbia, and has had several versions since its introduction in 1993. the fourth and current version of the system is based on xml technologies. among the core functional units of bisis26 are: • circulation of library material • cataloging of bibliographic records • indexing and retrieval of bibliographic records • downloading bibliographic records through z39.50 protocol • creation of a card catalog • creation of statistical reports an intermediary software component has been integrated into the bisis system. the intermediary component was written in the java programming language and implemented as a web application. communication between server applications that support the z39.50 and sru protocols and the intermediary component is done using the software package hessian.27 hessian offers a simple implementation of two protocols to communicate with web services, a binary protocol and its corresponding xml protocol, both of which rely on http. use of hessian package makes it easy to create a java servlet on the server side and proxy object on client-side, which will be used to 0..1 1..1 0..11..1 0..1 1..1 context + + + setstrategy (string strategy) mapindext ounderlayingprefix (string index) parseoperand (string index, cqlt ermnode node) : void : string : object strategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object cqlstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object dcstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object recordmanager + + select (object query) getrecords (int hits[]) : int[] : string[] wrapper + executequery (cqlnode cqlquery) makequery (cqlnode cql, object underlayingquery) : string[] : object information technology and libraries | march 2013 66 communicate with the servlet. in this case, the proxy object is deployed on the server side of protocol and the intermediary component contains a servlet. communication between the intermediary and bisis is also realized using the hessian software package, which leads to the possibility of creating a distributed system because the existing library system, the intermediary component, and server applications that implement the protocols can be located on physically separate computers. the bisis library system uses the lucene software package for indexing and searching. lucene has defined its own query language,29 so the wrapper component that is integrated into bisis has to transform to the cql query object model the object representation of the query defined by lucene. therefore the wrapper first needs to determine to which context set the index belongs and then apply the appropriate strategy for mapping the index. the rules for mapping the index to lucene fields are read from the corresponding xml document that is defined for every context set. listing 1 below provides an example of an xml document that contains some rules for mapping indexes of the dublin core context set to lucene index fields. the xml element index represents the name of index which is going to be mapped, while the xml element mappingelement contains the name of lucene field. for example, the title index defined in the dublincore context set, which denotes search by title of the publication, is mapped to the field ti, which is used by the search engine of bisis system. title ti creator au subject sb listing 1. xml document with rules for mapping the dublincore context set after the index is mapped to corresponding fields in lucene, a similar procedure is repeated for a relationship that may belong to some other context set or may have modifiers that belong to some information retrieval using a middleware approach | krstićev 67 other context set. it is therefore necessary to change the current strategy for mapping into a new one. by doing this, all elements of the cql query are converted into a lucene query, so the new query can be sent to bisis to be executed. approximately 40 libraries in serbia currently use the bisis system, which includes a z39.50 client, allowing the libraries to search the collections of other libraries that support communication through the z39.50 protocol. by integrating the intermediary component in the bisis system, non-bisis libraries may now search the collections of libraries that use bisis. as a first step, the intermediary component was just integrated in a few libraries, without any major problems. the component is most useful to the city libraries that use system bisis, because they have many branches, which can now search and retrieve bibliographic records from their central libraries. the component could potentially be used by other library management system, assuming the presence of an appropriate wrapper component to transform cql to the target query language. conclusion this paper describes an independent, modular software component that enables the integration of a service for search and retrieval of bibliographic records into an existing library system. the software component provides a single interface to server-side protocols to search and retrieve records, and could be extended to support additional server-side protocols. the paper describes the communication of this component with z39.50 and sru servers. the software component was developed for integration with the bisis library system, but is an independent component that could be integrated in any other library system. the proposed architecture of the software component is also suitable for inclusion of the existing library system into a single portal. the architecture of the portal should involve one mediator component whose task would be to communicate with wrapper components of individual library systems. each library system would implement its own search and store functionalities and could function independently of the portal. the basic advantage of this architecture is that it is possible to include new library systems that provide search services. it is only necessary to add a new wrapper that will perform the appropriate transformation of the query obtained from the mediator component in a query that the library system can process. the task of the mediator is to send queries to the wrapper, while each wrapper can establish communication with a specific library system. after obtaining the results from underlying library system, the mediator should be able to combine results, remove duplicate, and sort results. in this way end user would have impression that he has been searched a single database. references 1. “information retrieval (z39.50): application service definition and protocol specification,” http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed february 22, 2013). http://www.loc.gov/z3950/agency/z39-50-2003.pdf information technology and libraries | march 2013 68 2. “search/retrieval via url,” http://www.loc.gov/standards/sru/. 3. “contextual query language – cql,” http://www.loc.gov/standards/sru/specs/cql.html. 4. eric lease morgan, "an introduction to the search/retrieve url service (sru),” ariadne 40 (2004), http://www.ariadne.ac.uk/issue40/morgan. 5. larry e. dixson, "yaz proxy installation to enhance z39.50 server performance,” library hi tech 27, no. 2 (2009): 277-285, http://dx.doi.org/10.1108/07378830910968227; mike taylor and adam dickmeiss, “delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 6. mike taylor and adam dickmeiss,“delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 7. “voyager integrated library system,” http://www.exlibrisgroup.com/category/voyager. 8. “indexdata,” http://www.indexdata.com/. 9. “yazproxy,” http://www.indexdata.com/yazproxy. 10. theo van veen and bill oldroyd, “search and retrieval in the european library,” d-lib magazine 10, no. 2 (2004), http://www.dlib.org/dlib/february04/vanveen/02vanveen.html.. 11. “тhe european library,” http://www.theeuropeanlibrary.org./tel4/. 12. gio wiederhold ,“mediators in the architecture of future information systems,” computer 25, no. 3 (1992): 38-49, http://dx.doi.org/10.1109/2/121508. 13. enrico coiera, martin walther, ken nguyen, and nigel h. lovell, “architecture for knowledgebased and federated search of online clinical evidence,” journal of medical internet research 7, no. 5 (2005), http://www.jmir.org/2005/5/e52/. 14. shirley cousins and ashley sanders, “incorporating a virtual union catalogue into the wider information environment through the application of middleware: interoperability issues in crossdatabase access,” journal of documentation 62, no. 1 (2006): 120-144, http://dx.doi.org/10.1108/00220410610642084. 15. “sru software and tools,” http://www.loc.gov/standards/sru/resources/tools.html; “z39.50 registry of implementators,” http://www.loc.gov/z3950/agency/register/entries.html. 16. “jafer toolkit project,” http://www.jafer.org. 17. “cql-java: a free cql compiler for java,” http://zing/z3950.org/cql/java/. http://www.loc.gov/standards/sru/ http://www.loc.gov/standards/sru/specs/cql.html http://www.ariadne.ac.uk/issue40/morgan http://dx.doi.org/10.1108/07378830910968227 http://www.exlibrisgroup.com/category/voyager http://www.indexdata.com/ http://www.indexdata.com/yazproxy http://www.dlib.org/dlib/february04/vanveen/02vanveen.html http://www.theeuropeanlibrary.org./tel4/ http://dx.doi.org/10.1109/2/121508 http://www.jmir.org/2005/5/e52/ http://dx.doi.org/10.1108/00220410610642084 http://www.loc.gov/standards/sru/resources/tools.html http://www.loc.gov/z3950/agency/register/entries.html http://www.jafer.org/ http://zing/z3950.org/cql/java/ information retrieval using a middleware approach | krstićev 69 18. bojana dimić, branko milosavljević and dušan surla,“xml schema for unimarc and marc 21 formats,” the electronic library 28, no. 2 (2010): 245-262, http://dx.doi.org/10.1108/02640471011033611. 19. “unimarc formats and related documentation,” http://www.ifla.org/en/publications/unimarcformats-and-related-documentation. 20. “marc 21 format for bibliographic data,” http://www.loc.gov/marc/bibliographic/. 21. “unimarcslim xml schema,” http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd. 22. “marc21slim xml schema,” http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd. 23. “dublincore xml schema,” http://www.loc.gov/standards/sru/resources/dc-schema.xsd. 24. erich gamma, richard helm, ralph johnson, and john vlissides, design patterns: elements of reusable object-oriented software (indianapolis: addison–wesley, 1994), 315-323. 25. ibid. 26. danijela boberić and branko milosavljević, “generating library material reports in software system bisis,” (proceedings of the 4th international conference on engineering technologies icet, novi sad, 2009); danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard”, the electronic library 27, no. 3 (2009): 474-495, http://dx.doi.org/10.1108/02640470910966916 (accessed february 22, 1013); bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloguing,” the electronic library 27, no. 3 (2009): 509-528, http://dx.doi.org/10.1108/02640470910966934 (accessed february 22, 2013); jelena rađenović, branko milosavljеvić and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43, no. 1 (2009): 63-76, http://dx.doi.org/10.1108/00330330934110 (accessed february 22, 2013); danijela tešendić, branko milosavljević and dušan surla, “a library circulation system for city and special libraries”, the electronic library 27, no. 1 (2009): 162-186, http://dx.doi.org/10.1108/02640470910934669. 27. “hessian,” http://hessian.caucho.com/doc/hessian-overview.xtp. 28. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-539, http://dx.doi.org/10.1108/02640471011065355. acknowledgement the work is partially supported by the ministry of education and science of the republic of serbia, through project no. 174023: "intelligent techniques and their integration into wide-spectrum decision support." http://dx.doi.org/10.1108/02640471011033611 http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.loc.gov/marc/bibliographic/ http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd http://www.loc.gov/standards/sru/resources/dc-schema.xsd http://dx.doi.org/10.1108/02640470910966916 http://dx.doi.org/10.1108/02640470910966934 http://dx.doi.org/10.1108/00330330934110 http://dx.doi.org/10.1108/02640470910934669 http://hessian.caucho.com/doc/hessian-overview.xtp http://dx.doi.org/10.1108/02640471011065355 abstract experiences of migrating to an open source integrated library system vandana singh information technology and libraries | march 2013 36 abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librarians. in this research, twenty librarians who have worked in libraries that migrated to open-source integrated library system (ils) or are in the process of migrating were interviewed. the interviews focused on their experiences and the lessons learned in the process of migration. the results from the interviews are used to create guidelines/best practices for each stage of the adoption process of an open-source ils. these guidelines will be helpful for librarians who want to research and adopt an open-source ils. introduction open-source software (oss) has become increasingly popular in libraries, and every year more libraries migrate to an open-source integrated library system.1 while there many discrete opensource applications used by libraries, this paper focuses on the integrated library system (ils), which supports core operations at most libraries. the two most popular open-source ilss in the united states are koha and evergreen, and they are being positioned as alternatives to proprietary ilss. 2 as open-source software becomes more widely used, it is not enough just to identify which software is the most appropriate for libraries, but it is also important to identify best practices, common problems, and misconceptions with the adoption of these software packages. the literature on open-source ilss is usually in the form of a case study from an individual library or a detailed account of one or two aspects of the process of selection, migration, and adoption. in our interactions with librarians from across the country, we found that there are no consolidated resources for researching different open-source ilss and for sharing the experiences of the people using them. librarians who are interested in open-source ils cannot find one resource that can give them an overview of the necessary information related to open-source ilss. in this research, we interviewed twenty librarians from different types and sizes of libraries and gathered their experiences to create generalized guidelines for the adoption of open-source ilss. these guidelines are at a broader level than one single case study and cover all the different stages of the adoption lifecycle. the experiences of librarians are useful for people who are evaluating opensource ilss as well as those who are in the process of adoption. learning from their experiences will help librarians to not have to reinvent the wheel. this type of research helps the librarians by empowering them with the information they need; also, it helps us in understanding the current status of this popular software. vandana singh (vandana@utk.edu) is assistant professor, school of information sciences, university of tennessee, knoxville, tennessee. mailto:vandana@utk.edu experiences of migrating to an open-source integrated library system | singh 37 literature review as mentioned earlier, most of the literature on open-source ils is practitioner-based and provides case studies or single steps in the process of adoption. these research studies and resources are useful but do not address the broad information needs of the librarians who are researching the topic of open-source ilss. every library is different, so no two libraries are going to take the same path in the adoption process. the usefulness of these articles depends on whether the searcher can find one in a similar environment. another issue is the amount of information given in these resources. often these papers discuss only one aspect of moving to an open-source ils, for example choosing the open-source ils. if they do cover the whole process, there is usually not enough detail to know how they did it. for example, morton-owens, hanson, and walls organize their paper into five sections: motivation and requirements analysis, software selection, configuration, training, and maintenance. 3 however, each section includes more main points than description. another relevant stream of literature includes those that compare different opensource ilss. these range from little more than links to different open-source projects to in-depth comparisons.4 for example, muller evaluated open-source communities for different ilss on forty criteria and then compared the ils on over eight hundred functions and features.5 these types of articles are very useful for those who are trying to become acquainted with the different opensource ilss that are available and are in the evaluation phase of the process. again, they are not helpful in understanding the entire process of adoption. some best practices articles such as tennant may be a little older, but his nine tips are still valid and very useful as a good foundation for anyone thinking about making the switch to open-source ils.6 what are the factors for moving to an open-source ils? another reason why an open-source ils appeals to libraries is its underlying philosophy: “open source and open access are philosophically linked to intellectual freedom, which is ultimately the mission of libraries.” 7 the other two common reasons are cost and functionality. the literature covering the decision to move to an open-source ils makes it clear that there is a wide variety of ways that libraries come to this decision. in espiau-bechetoille, bernon, bruley, and mousin, the consortium made the decision in four parts. 8 the article states that they initially determined that four open-source ilss met their needs (koha, emilda, gnuteca and evergreen), although it is somewhat vague as to how they determined that koha was the best for their situation. indeed, most of the article is about how the three libraries involved had to work together, coordinating and dividing responsibilities. bissels shares that money was the main reason that the complementary and alternative medicine library and information service (camlis) decided to migrate to koha.9 they explain the process of making that decision. camlis was being developed from nothing, which makes their situation different than most libraries, and hence the process is different as well. michigan is an area known for its number of evergreen libraries. much of that is due to michigan library consortium. dykhuis explains the long, involved process that led to a number of evergreen installations. 10 mlc provides services to michigan information technology and libraries | march 2013 38 libraries, such as training and support. when they started looking for an ils system that all libraries could use, the main concerns were cost and functionality, which are the two key aspects that are mentioned in any discussion about choosing an ils. kohn and mccloy state that they decided to migrate to a new ils due to frustration with their current ils and that they involved all six of their librarians in the decision-making process.11 dennison and lewis show another reason why people migrate to open-source ils.12 they say that the proprietary system they were using was much more complicated than they needed. in addition, because of staff turnover no one really understood the system. this lack of expertise combined with increasing annual costs led to the decision to move to an open-source ils. an important lesson to take from this article is that they included all six of their librarians in the decisionmaking process. for a smaller library where everyone is an expert in their area of the library, it is important to get everyone involved in order to make sure that important functions or needed capabilities are not overlooked. almost any library that chooses open-source ils will name cost as one of their primary reasons. functionality is usually what determines which ils they choose. riewe conducted a study where he asked why each library chose its current ils. 13 open source libraries responded most often with ability to customize, the freedom from vendor lock-in, portability, and cost. how does migration happen? there are two general ways to do a migration: all at once or in stages. kohn and mccloy discuss a three-phase migration.14 the reason for this method was to spread the cost over several years. they did the public website and federated catalog as phase one and did the backend part during phases two and three. when multiple libraries are involved, phased migration is more like what is described in dykhuis.15 in that case, first a pilot program was created where a few libraries migrated over to the new system. when that was successful, then more libraries migrated. in contrast to a phased migration, walls discusses a migration completed in three months.16 this time includes installation, testing, and configuration. one interesting decision they made was to migrate at the end of the fiscal year in order to limit the amount of acquisitions data to be migrated. dennison and lewis completed their migration in two months. in this migration, most of the work was done by the company that was hosting their system. 17 this limited the amount of expertise that the library staff needed and made the migration much smoother from their perspective. migration can also be an opportunity; for example, morton-owens, hanson, and walls mention that they used the migration to koha to synchronize circulation rules between the branches. 18 it was also used to weed out inactive patrons (anyone who had not used the library in two years). data migration can be a problem, though. in the old system, the location code had been used for where the item was within the branch library, what kind of item it was, and how it circulated, but experiences of migrating to an open-source integrated library system | singh 39 these are three separate fields in koha. however, to some extent these issues are true of any migration between different systems. the migration experience is not always of a smooth transition. one of the advantages of opensource is the ability to customize and to develop functions that are specific to your library. in the case of new york academy of medicine library (nyam) working with its consortium waldo (westchester academic library directors organization), it was the decision to have developments completed before migration that caused the problems.19 their migration schedule was delayed by a month, and even after the delay not all of the eleven key features were complete. in addition, their migration took place when liblime (a proprietary vendor) with whom they were working announced their separation from the koha open-source community, which caused additional confusion. there are a couple of lessons to take from this. first, if doing development, be sure that the time needed is built into the migration schedule. also, when choosing an ils, think about how many developments are going to be necessary to successfully run the ils in your environment. lastly, try to prioritize the developments to minimize the number needed before “going live.” what does the literature say about training? very little is available about the training process for open-source ils. in current studies, training can be done in two ways: either by buying training from a vendor, or doing it internally.20 dennison and lewis found that having staff work on the system together at first and then try it independently was the most successful. 21 they had a demonstration system to practice, which also helped. in addition to this self-training, they had onsite training done by module, which allowed staff to attend only the training that was relevant and needed for them. in all of the articles discussed in this section, only one talks about ongoing maintenance. 22 the two-paragraph section includes suggested methods and does not mention anything about the amount of time or expertise needed for ongoing maintenance. in summary, in this literature review we found that there is research about open-source ils but that there is a need for much more work in this area. it was found that research articles and practitioner pieces are available and talk about different aspects of the adoption process. the main reasons for adoption are identified. there are also a few scattered individual articles about the process of migration, training, and maintenance. there is a gap in the studies of open-source ils, and there is no comprehensive study that documents the process, explains the steps, and identifies best practices and challenges for librarians who are interested. data sources the objective for data collection was to collect data from a variety of library types and sizes in order to collect a wide range of data. e-mail invitations for interviews were sent to koha and evergreen discussion list and to several other library-related discussion list. the e-mail requested volunteers for a telephone interview to share their experiences with open-source integrated library systems. potential participants identified themselves as being willing to be interviewed for information technology and libraries | march 2013 40 the project via e-mail and were then contacted by researchers to set up times for phone interviews. the list of interview questions was e-mailed to the participants before the interviews so that they could review the questions and had enough time to reflect on their experiences. the interviews were conducted with librarians working in a variety of libraries, including nine libraries using evergreen and one in the process of migrating to evergreen. seven libraries were using koha, two were using other open-source ilss, and one was using a proprietary ils while evaluating opensource. public libraries were the most numerous with eleven respondents, while there were also four special libraries, three academic libraries, and one school library. researchers also requested information about the size of the library collection. seven libraries owned collections of less than 100,000 items, seven had collections of 100,001–999,999 items, and four libraries owned collections of over 1,000,000 items. geographically, the respondents ranged all over the united states and included one library located in afghanistan (although the ils was installed in the united states). table 1 details the description of the data. data collection method interviews were chosen as the primary means of data collection in order to gather rich information that could be analyzed using qualitative methods. researchers sought to interview professionals from a variety of library types and sizes in order to collect a variety of different experiences regarding the selection, implementation, and ongoing maintenance of open-source ils. interviewing was the chosen methodology for several reasons. first, the goal was to go past the practitioner articles to see what kinds of trends there are in the migration process. this requires getting experiences from multiple librarians. interviews provide the in-depth “case-study description” that we were looking for.23 in addition, the most useful aspect of interviewing is the ability to follow up on an answer that the participant gives.24 this ensures that the same type of information is gathered from every interview. this is unlike surveys where sometimes participants do not respond in a way that answers what the researcher really wants to know. in our case, we used telephone interviews due to the geographic dispersion of the participants. it allowed us to talk to librarians from all over the country instead of just within our area. the interview questions are listed in appendix a. data analysis methodology interviews were transcribed, and identifying information was then removed from each of the transcribed documents. the transcripts were then uploaded into dedoose (www.dedoose.com), a web-based analysis program supporting qualitative and mixed methods research. dedoose provides online excerpt selection, coding, and analysis by multiple researchers for multiple documents. the research team used an iterative process of qualitatively analyzing the resulting documents. this method used multiple reviews of the data to initially code large excerpts which were then analyzed twice more to extract common themes and ideas. researchers began by reviewing each document for quantitative information, including the library type, ils in use, experiences of migrating to an open-source integrated library system | singh 41 number of it staff, and size of the collection. this information was added as metadata descriptors to each document in dedoose. upon review of the transcriptions and in discussions about the interview process, researchers began a content analysis of the qualitative data. codes were created based on this initial analysis to aid in categorizing the data from the interviews. two coders coded the entire dataset, specifying categories and themes to the excerpts of the interview transcription. all of the excerpts from each coder were used to create two tests. each coder then took the test of the other's codes by choosing their own codes for each excerpt. researchers earned scores of .96 and .95 using cohen’s kappa statistic, indicating very high reliability. table 1. description of libraries library size (number of items in collection) library type ils used under 100,000 academic koha 100,000–1,000,000 public evergreen under 100,000 special proprietary—considering open-source under 100,000 public koha school koha 100,000–1,000,000 public millennium—in process of migrating to evergreen 100,000–1,000,000 public evergreen 100,000–1,000,000 special koha under 100,000 public koha public evergreen 100,000–1,000,000 academic evergreen-equinox under 100,000 special koha over 1,000,000 academic kuali ole 100,000–1,000,000 public evergreen-equinox over 1,000,000 public evergreen 100,000–1,000,000 public evergreen under 100,000 public koha-bywater over 1,000,000 public evergreen-equinox under 100,000 public evergreen over 1,000,000 special collective access information technology and libraries | march 2013 42 results results from the interview questions were divided into eight categories identified as stages of migration, starting with evaluation of the ils, creation of a demonstration site, data preparation, identification of customization and development needs, data migration, staff training and user testing, and going live and long-term maintenance plans. best practices and challenges for each of the stages are presented below. this section begins with some general considerations gleaned from the responses. general consideration when migrating to an open-source ils • create awareness about open-source culture in your library—let them know what to expect. • develop it skills internally even if you use a vendor. • assess your staff’s abilities before committing. knowing what your staff can do will help determine whether you need to work with a vendor and to what degree or if you can do it alone. it is also a way to determine who is going to be on your migration team. • have a demonstration system; pre-migration, it can be used to test and train, and after migration it can be used to help find solutions to problems. this will also help develop skills internally. • communication is key. o if working with a vendor either as a single library or as a consortium, have a designated liaison with the vendor so all questions go through one person. in a consortium, ensure that everybody knows what is going on. • be prepared to commit a significant amount of staff time for testing, development, and migration, especially if you are not hiring a proprietary vendor for support. working with vendors • read contracts carefully. do not be afraid to ask questions and request changes. sometimes the other party has a completely different meaning for a word than you do. make sure you are on the same page. • ensure that there is an explicit timeline and procedure for the release of usable source code. • see that you are guaranteed and entitled to access the source code in case you need to switch developers, bring additional developers on board, or try to fix problems in-house. • provide specific examples when reporting problems. specific example will help the developers determine what the problem is and will help prevent any miscommunication. • designate a liaison between library staff and developers. the liaison will have to be someone who understands or can learn enough about what the developers are doing so that he or she can translate any problems or complaints from one group to the other. experiences of migrating to an open-source integrated library system | singh 43 • set up regular meetings for those involved in the migration project. regular meetings keep everyone focused and on task. they also provide an opportunity for questions, concerns, and problems to be addressed quickly. sample quote from interviews: one of the main things that came up is working with equinox, it was amazing. to start with, they were very, very helpful. and i had made an assumption, and i think the rest of us had, too, that we were working with, that this was developed by librarians, and that the terminology used would be library jargon. but that was not the case. we had some stumbling points over, we would say, okay, we want this, or this is a transaction, or that’s a bill, but that’s not what they called it. they didn’t call it a transaction, or they didn’t call it a bill. and so when we wrote the contract, we wrote it so that none of the patrons’ current checkout record would migrate, which is a big issue. and we didn’t realize that we weren’t using the right terminology in order to put that in the contract so that those current checkouts would move over with the migration and not just the record. stage 1—evaluation when making the decision of whether to migrate to open-source and which open-source ils is best for your library, the main things to start with are two questions: who makes the decision and on what basis. in practice, who makes the decision? • if a single library, one or two people make the decision, usually the library director and whoever is serving as the tech person. • if in a consortium, a committee makes the decision, often either the library directors or tech people. best practice suggestion: regardless of the size of the library system, even though these are the people making the decisions, you should always try to include as many groups as open-sourceible in the decision to move to open-source. which ils? • make a list of requirements based on your current system and a wish list of requirements for the new system. this is one area where you can involve more than just the system staff. asking the different departments (cataloging, acquisitions, and circulation) what their needs are ensures that the final decision includes everyone. • talk to other libraries that have made the move to open-source. they are a great resource for seeing how the system actually works, asking questions about the migration process, and providing information about open-source problems. if available, talk to a library that migrated from your current proprietary system. some systems are easier to migrate from than others, so this would be an opportunity to find out about any specific problems. information technology and libraries | march 2013 44 stage 2—set up a demonstration site • this is the most important guideline in the entire paper. create a demonstration site before making a final decision. o if there is still confusion in your team about which ils to use, setting up a demo site and installing koha and evergreen will be the best way to decide which one works for your situation. o doing at least one test migration will show what kind of data preparation needs to be done, usually by doing data mapping. data mapping is where you determine where the fields in your current system go when you move into the new system. another often-used term for data mapping is staging tables. o the demo site is also a good way to do staff training when needed. o the demo site also provides a way to determine what the best setup rules, policies, and settings are by testing them in advance. o it provides an opportunity to learn the processes of the different modules and how they differ from your library’s current practices. o most importantly, it serves as a test run for migration, which will make the actual migration go smoothly. sample quotes from interviews: do you think that the tests with the data and doing that really helped? oh yes, we were have had a disaster if we hadn’t done three tests and test loads. the pals office has done conversions multiple times before so they have it done, and we have good tech people. so they knew that the three tests loads would be a good thing. we did discover some of the tools that should be used, like for example one of the things that’s recommended for evergreen patron migration is to have a staging table, so you dump all your records into a database that you can then use to create the records in the evergreen tables. and you know we found out why that was important by running into a couple, a few problems with not being able to line up the data in the multiple fields. but you know that’s the sort of thing we expect. that’s pretty, i classify it as pretty typical migration learning, is finding out what works one way, what doesn’t the other. but you know that was a good thing because all the documents were saying, “you should use a staging table.” and we had to figure out ourselves why that was such a good idea. you should use a staging table for migration, i.e. move records into a database that is then used to create records in evergreen. it helps because some data doesn't line up in the same fields. it's a good idea to set up tables and rules far in advance in order to test before migration. it's very important to do data mapping very carefully because if you lose anything during migration it's difficult to get it back. check it to make sure that all the fields will be experiences of migrating to an open-source integrated library system | singh 45 transported correctly, and run tests while the old system is still up to make sure everything is there. stage 3—data preparation • clean up the data in advance. the better the data is, the easier it will transfer. this is also an opportunity to start fresh in a new system, so if there were inconsistencies or irritations in the old system this is a good time to fix it. o weeding—if you have records (either materials or patrons) that are out of date, get rid of them. the fewer the records, the easier migration will be. in addition, vendors often charge by record, so why pay for records you do not need? • consistency in data is key. if multiple people are working on the data, make sure they are working based on the same standards. • do a fine amnesty when migrating to a new system. depending on the systems (current and new), it is sometimes impossible or very difficult to transfer fine data into the new ils, so doing a fine amnesty will make the process simpler. • spot check data (testing, during, and after migration). catching problems early means there will be less work trying to fix problems later. sample quotes from interviews: i would say that if you’re considering converting to an ils software, that you’ve really got to do the data mapping very carefully with a fine-toothed comb because you don’t want to lose data. it’s too hard to get it back in. the data needs to be normalized so that the numbers of fields are uniform, names are in the correct order, and data is displayed correctly. the library has had to decide whether it is worthwhile to do things like getting rid of old abbreviations, etc. to make the data more easily understood. problems occur with old data if information such as note fields has been entered inconsistently. it's important to have procedures and to make sure everyone is following them. often things are put in different places, which causes a lot of trouble. they are doing a lot of cleanup of data, such as reducing the number of unique values in the case of some items that had a huge number of values in a drop down list. would like to spend more time on data cleanup but need to go ahead and get data migrated. stage 4—development/customization • one benefit of using an open-source ils is that any development done by any library comes back to the community, so often if you want something done, someone else might have already created that functionality and you can use it. information technology and libraries | march 2013 46 • develop partnerships. often if you want a specific development, someone else does too. if your staff does not have the expertise, then you could provide more of the funding and the partner could provide the tech skills or vice versa. partnerships mean the development will cost less than if you did it alone. • grant money is also available for open-source development and may be another funding option. sample quotes from interviews: the library does its own minor customizations and uses equinox for major jobs. they will lay out and prepare everything then hire equinox to write and implement new code. the library tries not to do things on its own but always looks for partnerships when doing any customizations. that way libraries that have similar needs can share resources. stage 5—migration process • write workflows and policies/rules beforehand. writing these when working on the demo site should provide step-by-step instructions on how to do the final migration. • having regular meetings during the migration process ensures that everyone stays on the same page and prevents miscommunications that will slow down the process. • if many libraries are involved, migration in waves will make things easier. this is generally a situation with a statewide consortium. usually there is a pilot migration of four to eight libraries, then after that, each wave gets a little bigger as the system becomes more practiced. this can also be a useful model if the libraries involved in the consortium are accepting the migration at different rates. • for a consortium that is coming from multiple ilss, having a vendor will make it easier. this is not to say that it could not be done without a vendor, but migrating from system a is going to be different than migrating from system b. this increases the complexity, which can make working with a vendor more cost effective. stage 6—staff training and user testing • who does the training? there are two main ways: by a vendor or internally. o if trained by a vendor, there are two options:  the vendor sends someone to the library to conduct training.  the library sends someone to the vendor for training and then he or she comes back and trains the rest of the staff. o if trained internally, there are a lot of training materials available. there are several libraries that have created their own materials and then made them available online. this is another time where having contacts with other libraries can help in using common resources. experiences of migrating to an open-source integrated library system | singh 47 • documentation is important for training. the best way is to find what documentation is already available and then customize it for your system. • do training fairly close to the “go live” date. • use a day or two for training. if a consortium is spread out geographically, use webinars and wikis. • when doing training, have specific tasks to do. this can be done a few ways. o do the specific tasks at the training. o demonstrate the tasks at training and then give “homework” where the staff does the specific tasks independently. to implement this option, staff has to have access to a demo system. o have staff try the tests on their own and use the training session for questions or problems they had. sample quotes from interviews: well we had, we hired equinox to come and do 2 days of training with us. so they’re here and did hands-on training with us. and then we also, they provided some packets of exercises that people could do on their own. and we had the system up and running in the background so that they could play with it about a week before we actually went live to the public so that they could get used to it, figure out how things worked, and work with it a little bit so they could answer questions before the public came and said, hey, how do i find my record, and i can’t get into this anymore. and the training was really good, but the hands-on was the best. and it’s not a difficult system to work, but you just need a little experience with it before it makes sense. evergreen runs a test server that anybody can download the staff client for that and work in their test server and just examine all of the records and how the system works, to figure out our workflows. we looked up documentation online—evergreen, indiana, pines, various places—copied the documentation they so graciously hosted online for everybody to use, went through it, found what worked for us. those couple staff members worked with other staff. we printed out kind of our little how-to guides for other people, depending on which worked, and told them they’re going to sit down, we’ve got terminals set up here, sit down and learn it. the admin person, she went through some quite detailed training. she went to atlanta and had training from equinox on a lot of aspects of evergreen. and then we also, she came back, and then she did training for all the libraries in the consortium, kind of an intensive day-long or half-day-long thing that she offered in several different central geographic locations so that all the libraries would have a chance to go and attend without having to drive too far. and we also did webinars, we got a couple webinars for the real outlying libraries. and we also have ongoing weekly webinars. and we have a wiki set up where we put all the information in the online manual and stuff like that. information technology and libraries | march 2013 48 all the training sessions were recorded, and so we had them on cd for new people coming on board. marketing for patrons • most libraries have not done anything elaborate, generally just announcements through posters, local papers, flyers, and on websites. • if the migration is greatly changing the situation for patrons, then more marketing is needed. • set up a demo computer for patrons to try or hold classes once the system is up. training for patrons • most libraries did not find this necessary. either the system is easy to use or it is set up to look like the old system. • if training patrons, create online tutorials. stage 7—“go live” and after • if possible, have your old system running for a month or two until you are sure that all the data got migrated over properly. sample quote from interviews: check it to make sure that all the fields will be transported correctly, and run tests while the old system is still up to make sure everything is there. maintenance—library staff (this assumes a migration being done in-house with little to no vendor support.) • staff has to have the technical knowledge (linux, sql, and coding). • often the money saved from moving to open-source is used to pay for additional staff. • most time is not spent on maintenance but on customization, updates, or problem-solving. maintenance—vendor • often start with higher vendor support, which lessens as the staff learns and develops expertise. discussion and conclusion interviews with twenty librarians from different settings provided insight into the process of the adoption of open-source ils and were used to develop the guidelines presented in this paper. these guidelines are not intended to serve as a complete guide to the process of adoption but are meant to give interested librarians an overview of the process. these guidelines can help libraries prepare themselves for the research and adoption far before they delve into the process. since these guidelines are all based in the real-life adoption experiences of libraries, they provide insight experiences of migrating to an open-source integrated library system | singh 49 into the challenges as well as the opportunities in the process. these guidelines can be used to develop an adoption plan and requirements for the adoption process. in future research, we are working to create adoption blueprints and total cost of ownership assessments (with and without vendors) for libraries of different sizes and types. also, as part of this research we have developed an information portal that contains resources that will help librarians in each phase of the process of open-source ils adoption. the information portal along with these guidelines will fill a very important gap in the resources available for open-source ils adoption. the url for the portal is not being provided in this paper to ensure anonymous review. references 1. marshall breeding, “automation marketplace 2012: agents of change,” library journal 137, no. 6 (april 1, 2012), http://lj.libraryjournal.com/2012/03/industry-news/automationmarketplace-2012-agents-of-change (accessed february 18, 2013). 2. tristan müller, “how to choose a free and open-source integrated library system,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 57–78, http://dx.doi.org/10.1108/10650751111106573 (accessed february 18, 2013). 3. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8, no. 1 (2011), 1–14, http://dx.doi.org/10.1080/15424065.2011.551486 (accessed february 18, 2013). 4. janet l. balas, “how they did it: ils migration case studies,” computer in libraries 31, no. 8 (2011): 37. 5. müller, “how to choose a free and open-source integrated library system.” 6. roy tennant, “technology decision-making: a guide for the perplexed,” library journal 125, no. 7 (2000): 30. 7. xan arch, “ultimate debate 2010: open source software—free beer or free puppy? a report of the lita internet resources & services interest group program, american library association annual conference, washington, dc, june 2010,” technical services quarterly 28, no. 2 (2011): 186–88, http://dx.doi.org/10.1080/07317131.2011.546268 (accessed february 18, 2013). 8. camille espiau-bechetoille, jean bernon, caroline bruley, and sandrine mousin, “an example of inter-university cooperation for implementing koha in libraries: collective approach and institutional needs,” oclc systems & services: international digital library perspectives 27, no.1 (2011): 40–44, http://dx.doi.org/10.1108/10650751111106546 (accessed february 18, 2013). http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://dx.doi.org/10.1108/10650751111106573 http://dx.doi.org/10.1080/15424065.2011.551486 http://dx.doi.org/10.1080/07317131.2011.546268 http://dx.doi.org/10.1108/10650751111106546 information technology and libraries | march 2013 50 9. gerhard bissels, “implementation of an open-source library management system: experiences with koha 3.0 at the royal london homoeopathic hospital,” electronic library and information systems 42, no. 3 (2008): 303–14, http://dx.doi.org/10.1108/00330330810892703 (accessed february 18, 2013). 10. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system,” collaborative librarianship 1, no. 2 (2009): 60–65, http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 (accessed february 18, 2013). 11. karen kohn and eric mccloy, “phased migration to koha: our library’s experience,” journal of web librarianship 4 no. 4 (2010): 427–34, http://dx.doi.org/10.1080/19322909.2010.485944 (accessed february 18, 2013). 12. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college,” georgia library quarterly 48 no. 2 (2011): 6–8, http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 (accessed february 18, 2012). 13. linda m. riewe, “survey of open-source integrated library systems,” master’s theses, paper 3481, http://scholarworks.sjsu.edu/etd_theses/3481 (accessed february 18, 2013). 14. karen kohn and eric mccloy, “phased migration to koha: our library’s experience.” 15. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system.” 16. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 51–56, http://dx.doi.org/10.1108/10650751111106564 (accessed february 13, 2013). 17. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 18. emily g. morton-owens, karen l. hanson, and ian walls “implementing open-source software for three core library functions: a stage-by-stage comparison.” 19. lisa genoese and latrina keith, “jumping ship: one health science library’s voyage from a proprietary ils to open source,” journal of electronic resources in medical libraries 8, no. 2 (2011): 126–33, http://dx.doi.org/10.1080/15424065.2011.576605 (accessed february 18, 2013). 20. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences”; emily g. morton-owens, karen l. hanson, and ian walls, http://dx.doi.org/10.1108/00330330810892703 http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 http://dx.doi.org/10.1080/19322909.2010.485944 http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 http://scholarworks.sjsu.edu/etd_theses/3481 http://dx.doi.org/10.1108/10650751111106564 http://dx.doi.org/10.1080/15424065.2011.576605 experiences of migrating to an open-source integrated library system | singh 51 “implementing open-source software for three core library functions: a stage-by-stage comparison.” 21. l. h. lyn dennison and a. f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 22. morton-owens, hanson, and walls, “implementing open-source software for three core library functions.” 23. laurel jizba mis, “an essay on our interviews, and a call for participation,” journal of internet cataloging 6 no. 2 (2003): 17–20, doi: 10.1300/j141v06n02_04 (accessed february 18, 2013). 24. golnessa galyani moghaddan and mostafa moballeghi, “how do we measure the use of scientific journals? a note on research methodologies,” scientometrics 76, no. 1 (2008): 125– 33, doi: 10.1007/s11192-007-1901-y (accessed february 18, 2013). doi:%2010.1300/j141v06n02_04 doi:%2010.1007/s11192-007-1901-y information technology and libraries | march 2013 52 appendix a. interview questions library environment 1. what is your library type (school, academic, public, special, etc.)? 2. what is your library size (how many employees, population served, and number of materials)? evaluation (we would like as much info as possible about why the system was chosen over others, including any existing system.) 3. what open-source ils are you using and why did you choose it? 4. when choosing an open-source ils, where did you go for information (vendor/ils pages, community groups, personal contacts, etc)? 5. who was involved in deciding which ils to use? adoption (we would like to document specific problems or issues that could be used by other libraries to ease their installation.) 6. were there any problems during migration? 7. what do you know now that you wish you had known before migration? 8. how long did migration take? were you on schedule? 9. if getting paid support, how did the vendors (previous and current) help with migration? implementation (again, specific examples of the things that worked well or didn't work. how can other libraries learn from this experience?) 10. what kind of (and how much) training did your library staff receive? 11. did you do any kind of marketing to your patrons? 12. (if haven’t gotten to this part yet), what are your plans for implementation? 13. how much time did implementation take and were you on schedule? maintenance (this information will be especially important when compared to the library type and size as a reference for other libraries. we would like to get answers that are as specific as possible). 14. how large is your systems staff? is it sufficient to maintain the system? 15. how much time do you spend each week doing system maintenance? how does this compare to your old system? experiences of migrating to an open-source integrated library system | singh 53 16. what resources (or channels) do you use to solve your technical support issues? what roles do paid vendors play in maintenance of your system? advice for other libraries (these open-ended questions are an opportunity to learn more information that we might not have thought of asking about. responses could provide a valuable resource to other libraries as they plan their implementation). 17. what is the best thing and worst thing about having an open-source ils? 18. are there any lessons or advice that you would like to share with other librarians who are thinking about or migrating to an open-source ils? acknowledgment this research was funded by an early career imls grant. abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librari... editorial: singularity—are we there, yet? | truitt 55 i n my last column, i wrote about two books—nicholas carr ’s the shallows and william powers’ hamlet’s blackberry—relating to learning in the always-on, always connected environment of “screens.”1 since then, two additional works have come to my attention. while i won’t be able to do them justice in the space i have here, they deserve careful consideration and open discussion by those of us in the library community. if carr’s and power’s books are about how we learn in an always-connected world of screens, sherry turkle’s alone together and elias aboujaoude’s virtually you are about who we are in the process of becoming in that world.2 turkle is a psychologist at mit who studies human– computer interactions. among her previous works are the second self (1984) and life on the screen (1995). aboujaoude is a psychiatrist at the stanford university school of medicine, where he serves as director of the obsessive compulsive disorder clinic and the impulse control disorders clinic. based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encountered by the authors, both works are characterized by thorough research and thoughtful analysis. while their approaches to the topic of “what we are becoming” as a result of screens may differ— aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while turkle’s examines the relationship between loneliness and solitude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. i’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that i risk in seeking to distill two very complex studies into such a small space. and, frankly, i’m still trying to wrap my head around both the books and the larger issues they raise. with that caveat, i still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. in the sections that follow, i’d like to touch on a very few themes that emerge from these books. ■■ “why do people no longer suffice?”3 a pair of anecdotes that turkle recounts to explain her reasons for writing the current book seems worth sharing at the outset. in the first, she describes taking her then-fourteen-year-old daughter, rebecca, to the charles darwin exhibition at new york’s american museum of natural history in 2005. among the many artifacts on display was a pair of live giant galapagos tortoises: “one tortoise was hidden from view; the other rested in its cage, utterly still. rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘they could have used a robot.’” when turkle queried other bystanders, many of the children agreed, with one saying, ‘for what the turtles do, you didn’t have to have live ones.’” in this case, “alive enough” was sufficient for the purpose at hand.4 sometime later, turkle read and publicly expressed her reservations about british computer scientist david levy’s book, love and sex with robots, in which levy predicted that by the middle of this century, love with robots will be as normal as love with other humans, while the number of sexual acts and lovemaking positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 contacted by a reporter from scientific american about her comments regarding levy’s book, turkle was stunned when the reporter, equating the possibility of relationships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. if we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for turkle, it suggested that we are on the threshold of what she terms the “robotic moment”: this does not mean that companionate robots are common among us; it refers to our state of emotional—and i would say philosophical—readiness. i find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. we don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. at the robotic moment, the performance of connection seems connection enough. we are poised to attach to the inanimate without prejudice.6 marc truitteditorial: singularity—are we there, yet? marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2011 while these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 and the rudeness, as well we know, isn’t limited to mobile communications. referring to “flame wars,” which regularly erupt in online communities, aboujaoude observes: the internet makes it easier to suspend ethical codes governing conduct and behavior. gentleness, common courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a completely naked, often unpleasant human being.11 even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are characterized by a form of curtness heretofore unacceptable in paper communications. remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advisable in cyberspace.”12 ■■ castles in the air: avatars, profiles, and remaking ourselves as we wish we were finally, a place to love your body, love your friends, and love your life. —second life, “what is second life?”13 one of the interesting and worrisome themes in both turkle’s and aboujaoude’s studies is that of the reinvention and transformation of the self, in the form of online personas and avatars. this is the stock-in-trade of online communities and gaming sites such as facebook and second life. these sites cater to our nearly universal desire to be someone other than who we are: online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. we can write the facebook profile that pleases us. we can edit our messages until they project the self we want to be.14 the problem is that for many there is an increasing fuzziness at the interface between real and virtual ■■ changing mores, or the triumph of rudeness i can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . the computer somehow nullifies the social contract. —heather champ, yahoo!’s flickr community manager7 sadly, we’ve all experienced it. we get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. all too often, the person is loudly carrying on about matters we wish we weren’t there to hear. perhaps it’s a fight with a partner. or a discussion of some delicate health matter. whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. what’s wrong with these individuals? do they really have no consideration or sense of propriety? it turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. indeed, the everyday obnoxious intrusions by those using public spaces for their private conversations are among the least of offenders. consider the following situations shared by turkle: sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: i invited a woman, about fifty, who works in washington. in the middle of a conversation about the middle east, she takes out her blackberry. she wasn’t speaking on it. i wondered if she was checking her e-mail. i thought she was being rude, so i asked her what she was doing. she said that she was blogging the conversation. she was blogging the conversation.8 turkle later tells of attending a memorial service for a friend. several [attendees] around me used the [printed] program’s stiff, protective wings to hide their cell phones as they sent text messages during the service. one of the texting mourners, a woman in her late sixties, came over to chat with me after the service. matter-of-factly, she offered, “i couldn’t stand to sit that long without getting on my phone.” the point of the service was to take a moment. this woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 editorial: singularity—are we there, yet? | truitt 57 enough” became yet more blurred. turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. speaking of a tamagotchi, one child wrote a poem: “my baby died in his sleep. i will forever weep. then his batteries went dead. now he lives in my head.”19 the concept of “alive enough” is not unique to the very young, either. by 2009, sociable robots had moved beyond children’s toys with the introduction of paro, a baby seal-like “creature” aimed at providing companionship to the elderly and touted as “the most therapeutic robot in the world. . . . the children were onto something: the elderly are taken with the robots. most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 where does it end? turkle goes on to describe nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even sherry can love.” but when turkle injured herself in a fall a few months later, [i was] wheeled from one test to another on a hospital stretcher. my companions in this journey were a changing collection of male orderlies. they knew how much it hurt when they had to lift me off the gurney and onto the radiology table. they were solicitous and funny. . . . the orderly who took me to the discharge station . . . gave me a high five. the nursebot might have been capable of the logistics, but i was glad that i was there with people. . . . between human beings, simple things reach you. when it comes to care, there may be no pedestrian jobs.21 but need we librarians care about something as farfetched as nursebot? absolutely. now that ibm has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at jeopardy!, is the robotic reference librarian really that much of a hurdle? take a bit of watson technology, stick it in nursebot, give it sensible shoes, and hey, i can easily imagine bibliobot, factory-standard in several guises, including perhaps donna reed (as mary, who becomes the town librarian in the alter-life of capra’s it’s a wonderful life) or shirley jones (as marian, the librarian, in the music man). i like donna reed as much as anyone, but do i really want reference assistance from her android doppelgänger? but then, for years after the introduction of the atm, i confess that i continued taking lunch hours off just so that i could deal with a “real person” at the bank, so perhaps it’s just me. the future is in the helping/service professions, indeed! and when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “not surprisingly, people report feeling let down when they move from the virtual to the real world. it is not uncommon to see people fidget with their smartphones, looking for virtual places where they might once again be more.”15 turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: in games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. distinctions blur.16 and indeed, some completely lose sight of what is real and what is not. aboujaoude relates the story of alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’i then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, nadia—“from her waist size to the number of freckles on her cheeks.” speaking of his former “real” girlfriend, alex said, “real had become overrated.”17 ■■ “don’t we have people for these jobs?”18 ageist disclaimer: when i grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. while not technically a robot, the other machine that characterized “that time” was the automated teller machine (atm), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. as i recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. now, fast forward 30 years. the first half of turkle’s book is the history of “sociable robots” and our interactions with them. moving from the reactions of mit students to joseph weizenbaum’s eliza in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as furby, aibo, and my real baby. with each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. and with each generation, the line between “alive” and “alive 58 information technology and libraries | june 2011 to admit that we’ve seen many examples of how connectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” in the interest of getting your attention, i’ve admittedly selected some fairly extreme examples from the two books at hand. however, i think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? which takes us to a second concern raised by some of my gentle draft-readers: we’ve heard this tale before. television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. how are screens and the technology of always-connected any different? a part of me—the one that winces every time someone glibly refers to the “transformational” changes taking place around us—agrees. i was trained as a historian, to take a long view about change. and we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. that said, my interest here is in seeing our profession begin a conversation about how connective technologies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. television and radio were fundamentally different technologies in that they were one-way broadcast tools. and to the best of my recollection, neither has ever been widely adopted by or in libraries. yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. but neither has ever really had an impact on the traditional core business of libraries, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. connective technologies, in the form of intelligent machines and network-based communities, can be said to be antithetical to this core activity. we need to think about that, and to consider carefully the behaviors we may be encouraging. notwithstanding those critics of change in our profession who feel we move far too glacially, i would maintain that we have often been, if not at the forefront of the technology pack, then certainly among its most enthusiastic ■■ where from here? i titled this column “singularity.” for those not familiar with the literature of science fiction, turkle provides a useful explanation: this notion has migrated from science fiction to engineering. the singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . at the singularity, everything will become technically possible, including robots that love. indeed, at the singularity, we may merge with the robotic and achieve immortality. the singularity is technological rapture.22 i think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singularity. but the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. turkle puts it this way: the triumphalist narrative of the web is the reassuring story that people want to hear and that technologists want to tell. but the heroic story is not the whole story. in virtual worlds and computer games, people are flattened into personae. on social networks, people are reduced to their profiles. on our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . we are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 some of my endlessly patient friends—the ones who provide both you and me with some measure of buffering from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: how much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? the individuals studied are, of course, obsessive and compulsive, in relation to the internet and new technologies. do their behaviors not represent an extreme end of the population? a fair question. and yes, the examples i’ve provided in this column are admittedly somewhat extreme. but turkle and aboujaoud both point to many examples that are far more common. i think all of us would have editorial: singularity—are we there, yet? | truitt 59 references and notes 1. marc truitt, “editorial: the air is full of people,” information technology and libraries 30 (mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed apr. 25, 2011). 2. sherry turkle, alone together: why we expect more from technology and less from each other (new york: basic books, 2011); elias aboujaoude, virtually you : the dangerous powers of the e-personality (new york : norton, 2011). 3. turkle, 19. 4. ibid., 3–4. 5. quoted in ibid., 5. 6. ibid., 9–10. emphasis added. 7. quoted in aboujaoude, 99. 8. turkle, 162. emphasis in original. 9. ibid, 295. 10. turkle, 161. 11. aboujaoude, 96 12. ibid., 98. 13. quoted in turkle, 1. 14. ibid., 12. 15. ibid. 16. ibid., 153. 17. aboujaoude, 77–78. 18. turkle, 290. 19. ibid., 34. 20. ibid., 103–4. 21. ibid., 120–21. 22. ibid., 25. 23. ibid., 18–19. 24. for a recent and typical example, see david carr, “keep your thumbs still when i’m talking to you,” new york times, apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17text.html (accessed may 2, 2011). 25. aboujaoude, 283. adopters. in our quest to remain “relevant” to our university or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. at the same time, we’ve been remarkably uncritical and unreflective about our role in, and the larger implications of, what we might be doing by adopting these technologies. aboujaoude, in a surprising, but i think largely correct summary comment, observes: extremely little is available, however, for the individual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. as centers of learning, public libraries, schools, and universities may be disproportionately responsible for this deficiency. they outdo one another in digitalizing their holdings and speeding up their internet connections, and rightfully see those upgrades as essential to compete for students, scholars, and patrons. in exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal consequences of the world wide web. the irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 i could hardly agree more. so, how do we answer aboujaoude’s critique? 16 information technology and libraries | march 2011 the internet public library (ipl): an exploratory case study on user perceptions environment. digital and physical holdings, academic and public libraries, free and subscription resources, internet encyclopedias, and a multitude of other offerings form a complex (and often overwhelming) informationseeking environment. to move forward effectively and to best serve its existing and potential users, the ipl must pursue a path that is adapted to the present state of the internet and that is user-informed and user-driven. recent large-scale studies, such as the 2005 oclc reports on perceptions of libraries and information resources, have begun to explore user perceptions of libraries in the complex internet environment.3 these studies emphasize the importance of user perceptions of library use, questioning whether libraries still matter in the rapidly growing infosphere and what future use trends might be. in the internet environment, user perceptions play a key role in use (or nonuse) of library resource and services as information-seekers are faced with myriad easily accessible electronic information sources. the ipl’s name, for example, may or may not be perceived as initially helpful to users’ information-seeking needs. repeat use relates to such perceptions as well, in the amount of value users perceive in the library resources over the many other sources available. in beginning to explore such issues, there is a need for current research addressing user perceptions of an internet public library: what the name implies to both existing and potential users as well as the associated functions and resources that should be offered. in this study, we present an exploratory case study on public perceptions of the ipl. using qualitative analysis of interviews with ten college students, some of whom are current users of the ipl and others with no exposure to the ipl, begins to yield an understanding of the public perception of what an internet public library should be. this study seeks to expand our understanding of such issues and explore the present-day requirements for the ipl in addressing the following research questions: ■■ what is the public perception of an internet public library? ■■ what services and materials should an internet public library offer? ■■ background the ipl: origins and research in 1995, joe janes, a professor at the university of michigan’s school of information and library studies, ran a graduate seminar in which a group of students created a web-based library intended to be a hybrid of both physical library services and internet resources and offerings. the resulting ipl would take the best from both the physical and digital the internet public library (ipl), now known as ipl2, was created in 1995 with the mission of serving the public by providing librarian-recommended internet resources and reference help. we present an exploratory case study on public perceptions of an “internet public library,” based on qualitative analysis of interviews with ten college student participants: some current users and others unfamiliar with the ipl. the exploratory interviews revealed some confusion around the ipl’s name and the types of resources and services that would be offered. participants made many positive comments about the ipl’s resource quality, credibility, and personal help. t he internet public library (ipl), now known as ipl2, is an online-based public service organization and a learning and teaching environment originally developed by the university of michigan’s school of information and currently hosted by drexel university’s ischool. the ipl was created in 1995 as a project in a graduate seminar; a diverse group of students worked to create an online space that would be both a library and an internet institution, helping librarians and the public identify useful internet resources and content collections. with a strong mission to serve and educate a varied community of users, the ipl sought to help the public navigate the increasingly complex internet environment as well as advocate for the continuing relevance of librarians in a digital world. the resulting ipl provided online reference, content collections (such as ready reference and a full-text reading room), youth-oriented resources, and services for other librarians, all through its free, web-based presence.1 currently, the ipl consists of a publicly accessible website with several large content collections (such as “potus: presidents of the united states”), sections targeted toward teens and children (“teenspace” and “kidspace”), and a question and answer service where users can e-mail questions to be answered by volunteer librarians.2 there has been an enormous amount of change in the internet and digital libraries since the ipl’s inception in 1995. while web use statistics, user feedback, and incoming patron questions indicate that the ipl remains well-used and valued, there are many questions about its place in an increasingly information-rich online monica maceli (mgm36@drexel.edu) is a doctoral student, susan wiedenbeck (susan.wiedenbeck@drexel.edu) is a professor, and eileen abels (eabels@drexel.edu) is a professor at the college of information science and technology, drexel university, philadelphia. monica maceli, susan wiedenbeck, and eileen abels the internet public library (ipl) | maceli, wiedenbeck, and abels 17 there has also been a continuous evaluation of the role of the library in an increasingly digital world, a question janes sought to address in his first imaginings of the ipl. a study conducted in 2005 claimed that “electronic information-seeking by the public, both adults and children, is now an everyday reality and large numbers of people have the expectation that they should be able to seek information solely in a virtual mode if they so choose.”12 this trend in electronic information-seeking has driven both public and academic libraries to create and support vast networks of licensed and free online information, directories, and guides. these electronic offerings, which (at least in theory) are desired and appreciated by users, are often overshadowed by the wealth of quickly accessible information from tools such as search engines.13 in competition with quickly accessible (though not necessarily credible or accurate) information sources, librarians have struggled to find their place and relevance in an evolving environment. google and other web search engines often shape users’ experiences and expectations with information-seeking, more so than any formal librarian-driven instruction such as in boolean searching. several recent comprehensive studies have explored user perceptions of libraries, both physical and digital, in relationship to the larger internet. abels explored the perspective of libraries and librarians across a broad population consisting of both library users and non-users.14 her findings included the fact that web search engines were the starting point for the majority of information-seeking, and that there is a high preference among users for virtual reference desk services. she proposed an information-seeking model in which the library serves as one of many internet resources, including free websites and interpersonal sources, and is likely not the user’s first stop. in respect to this model of information-seeking, abels suggests that “librarians need to accept the broader framework of the information seeker and develop services that integrate the library and the librarian into this framework.”15 in 2005, oclc released what is possibly the most comprehensive study to date of the public’s perceptions of library and information resources as explored on a number of levels, including both the physical and digital environments.16 findings relevant to libraries on the internet (and this study) included the following: ■■ 84 percent of participants reported beginning an information search from a search engine; only 1 percent started from a library website ■■ there was a preference for self-service and a tendency to not seek assistance from library staff ■■ users were not aware of most libraries’ electronic resources ■■ college students have the highest rate of library use ■■ users typically cross-reference other sites to validate their results worlds while developing its own unique offerings and features.4 janes had conceived the idea in 1994, when the internet’s continued growth began to make it clear that the role of libraries and librarians would be forever changed as a result. janes’ motivating question was “what does librarianship have to say to the network environment and vice versa?”5 the ipl tackled a broad mission of enhancing the value of the internet by providing resources to its varied users, educating and strengthening that community, and (perhaps most unique at the time) communicating “its’ creators vision of the unique roles of library culture and traditions on the internet.”6 initial student brainstorming sessions yielded the priorities that the ipl would address and included such services as reference, community outreach, and youth services. the first version of the ipl contained electronic versions of classic library offerings, such as magazines, texts, serials, newspapers, and an e-mail reference service. the ipl was well received and continued its development, adding and expanding resources to support specific communities such as teens and children. the ipl was awarded several grants over the next few years, allowing for expansion and continuation.7 a wealth of librarian volunteers, composed of students and staff, contributed to the ipl, in particular toward the e-mail reference services. with a stated goal of responding to patrons’ questions within one week, the reference services provide help and direct contact with the ipl’s user base, many of whom are students working on school assignments.8 the ipl’s collections are discoverable through search engines (popular offerings such as the “potus: presidents of the united states” resources rank highly in search results lists) and through its presence on social networking sites such as myspace, facebook, and twitter. additionally, ipl distributes brochures to teachers and librarians at relevant conferences. the ipl has been the focus of many research studies covering a broad range of themes, such as its history and funding, digital reference and the ipl’s question-andanswer service, and its resources and collections.9 also, in line with the original mission of the ipl, janes developed the internet public library handbook to share best practices with other librarians.10 the majority of publications, however, have focused on ipl’s reference service, which is uniquely placed as a librarian-staffed volunteer digital reference service. as the ipl has collected and retained all reference interactions since its inception in 1995, there is a wealth of data readily available to such studies and exploratory work into how best to analyze it.11 user perceptions of digital libraries the internet is a vastly different world than it was in the early days of the ipl’s creation. the expectations of library patrons, both in digital and in physical environments, have changed as well. and as the internet evolves 18 information technology and libraries | march 2011 of the public, which is the intention of this study. ■■ method this exploratory study consisted of a qualitative analysis of data gathered from interviews and observations of ten college student participants who were academic library users and nonusers of the ipl. a pilot study preceded the final research effort and allowed us to iteratively tailor the study design to best pursue our research questions. our initial study design incorporated a usability test portion, in which users were presented with a series of information-seeking needs and instructed to use the ipl’s website to answer the questions. however, we later dropped this portion of the study because pilot results found that it contributed little to answering our research questions about public perceptions; it largely explored implementation details, which was not the focus of this study. following the pilot study, we recruited ten drexel university students from the university’s w. w. hagerty library. this ensured recruiting participants who were at least minimally familiar with physical libraries and who were from a variety of academic focuses. the participant group included eight females and two males—two were graduate students, eight were undergraduates—from a variety of majors, including biology, biomedical engineering, business, library science, accounting, international studies, and information systems. participants took an average of twenty-six minutes to complete the study. the study consisted of a short interview to assess the user’s experience with public libraries (both physical and online) and their expectations of an internet public library. these open-ended questions (included in the appendix) sought to determine what features, services, or content were desired or expected by users, whether the term of “internet public library” was meaningful, if there were similarities to web-based systems that the participants were already familiar with, or if they had previously used a website they would consider an internet public library. all interviews were audio recorded and transcribed. an initial coding scheme was established and iteratively developed (table 1). once we observed significant overlap between participant responses, the study then proceeded to the final analysis and presentation, using inductive qualitative analysis to code text and identify themes from the data.22 ■■ findings all participants were current or former public library patrons; six participants (p1, p4, p5, p6, p8, and p9) were a portion of the study focused on library identity or brand in the mind of the public; participants found the library brand to be “books,” with no other terms or concepts coming close. as a companion report to this study, oclc released a report focused on the library perceptions of college students.17 as our study uses a college student participant base, oclc’s findings are highly relevant. the vast majority of college students reported using search engines as a starting point for informationseeking and expressed a strong desire for self-service library resources. as compared to the general population, however, college students have the highest rate and broadest use of both physical and digital library resources and a corresponding high awareness of these services. the relationship between public libraries and the internet was explored in depth in a 2002 study by d’elia et al.18 the study sought to systematically investigate patrons’ use of the internet and of public libraries. findings included the fact that the internet and public libraries are often complementary; that more than half of internet users were library users and vice versa; and that libraries are valued more than the internet for providing accurate information, privacy, and child-oriented spaces and services. participants made a distinction between the service characteristics of the public library versus those of the internet. many of the most-valued characteristics of the internet (such as information that is always available when needed) were not supported by physical libraries because of limited offerings and hours. in addition to large, comprehensive surveys, there have been several case-study approaches, exploring user perceptions of a particular digital library or library feature. tammaro researched user perceptions of an italian digital library, finding the catalog, online databases, and electronic journals to be most valued; she found speed of access, remote access, a larger number of resources, and personalization to be key digital library services.19 this study also reported a consistent theme in digital library literature: a patron base primarily consisting of novice users who do not know how to use the library and are unaware of the various services offered. crowley et al. evaluated an existing academic library’s webpages for issues and user needs.20 they identified issues with navigational structures and overly technical terminology and a general need for robust help and extensive research portals. in respect to our study, we found no literature that studied perceptions of internet public libraries. as mentioned earlier, research that addressed the ipl from the perspective of its patrons largely focused on ipl’s reference services. in 2008, ipl staff reported 13,857 reference questions received and 9,794,292 website visitors.21 although reference is clearly a vital and well-used service, there is also a great deal of website collection use that must be researched. recent literature does not address the current state of the ipl from the perspective the internet public library (ipl) | maceli, wiedenbeck, and abels 19 of such a library. a few remained confused about how such a concept would relate to physical public libraries and the internet in general. one participant assumed that such a term must mean the web presence of a particular physical public library. another’s immediate reaction was to question the value of such a venture in light of existing internet resources: “i mean, the internet is already useful, so i don’t know [how useful it would be]” (p2). two other participants found meaning in the term by associating it with a known library website, such as that of their academic library or local physical public library. when asked what websites seem similar in function or appearance to what they would consider an internet public library, responses varied. while most participants could not name any similar website or service, one mentioned several academic library websites that he was familiar with, another described several bookseller websites (amazon.com, half.com, and abebooks.com), and a third mentioned wikipedia (but then immediately retracted the statement, after deciding that wikipedia was not a library). theme 2: quick and easy, but still credible participants were highly enthusiastic about the perceived benefits in access to and credibility of information from an internet public library. ease of use and faster information access, often from home, were key motivators for use of internet-based libraries, both public and academic. as described earlier, there is a wealth of competing information options freely available on the internet. given this, participants felt that an internet public library would offer the most value because of its credible information: i like the ready reference [almanacs, encyclopedias]. . . . i’m not used to using any of these, wikipedia is just so ready and user friendly. it’s so easy to go to wikipedia but it’s not necessarily credible. . . . whereas i feel like this is definitely credible. it’s something i could use if i needed to in some sort of academic setting. (p10) theme 3: lack of differentiation between public and academic; physical and digital libraries for many participants, there was confusion about what was or was not a public library, and they initially considered their academic library in that category. overall, participants did not think of public and academic libraries (physical or on the internet) as distinctly different; rather they were more likely to be associated with phase of life. participants that were not current public library users reported using public libraries frequently during their years of elementary education. for participants that were current public library users, physical public libraries (and other local academic libraries) were used to fill in the gaps current public library users, and four (p2, p3, p7, and p10) had used public libraries in the past but were no longer using their services. two participants were graduate students (p3 and p9) with the remainder undergraduates, and two of the ten students had used the ipl website before (p3 and p6). the participants could be characterized as relatively infrequent public library users with a strong interest in the physical book holdings of the public library, primarily for leisure but frequently for research as well. several participants mentioned scholarly databases that were provided by their public library (typically from within the library or online with access using a public library card). there was also interest in leisure audiovisual offerings and in using the library as a destination for leisure. the following themes illustrate our main findings with respect to our research questions. as described above, we conceptualized our raw data into broad themes through an iterative process of inductive coding and analysis. although multiple themes emerged as associated with each of our research questions, we present only the most important and relevant themes (see table 2). all themes were supported by responses from multiple participants. we will further elaborate the themes discovered later in this section; a selected relevant and meaningful participant quote illustrates each theme. theme 1: confusion about name “internet public library” was not an immediately clear term to four of the participants; the six other participants were able to immediately begin describing their concept table 1. inductive coding scheme developed from raw transcript text, used to identify key themes coding scheme physical public libraries tied to life phase confusion between academic and public current use frequency of use perceptions of an internet public library access properties of physical libraries reference resources tools users general internet use academic library use similar sites to ipl 20 information technology and libraries | march 2011 would contain both electronic online items and locally available items in physical formats. in particular, connections to local physical libraries to share item holdings and availability status were desired: “general book information and maybe a list of where books can be found. like online, the local place you can find the books.” (p7) given that information-seeking, for this group, was conducted indiscriminately across physical and digital libraries, this integrated view into local physical resources seems to be a natural request. theme 6: personal and personalized help although no participants claimed that reference was a service that they typically use during their physical public library experiences, it was a strong expectation for an internet public library and mentioned by nearly every participant. when questioned as to how this reference interaction should take place, there was a clear preference for communicating via instant message: “reference information. . . . you know, where you have real people. a place where you can ask questions. . . . if you think you can get an answer at a library, then online you would hope to get the same things.” (p1) in addition to being able to interact with a “real” librarian, participants desired other personalized elements, such as resources and services dedicated to information needy populations (like children) as well as resources supporting the community and personal lifestyle issues and topics (like health and money). ■■ discussion in summary, we characterized the participants in this case study as low-frequency physical public library users with a high association between life phase (high school or grade school) and public library use. participants looked to public libraries to provide physical books—primarily for leisure but often for research use as well—leisure dvds and cds, scholarly databases, and a space to “hang for items that could not be located at their school’s academic library, either through physical or digital offerings. consistent with this finding, a few participants reported conducting searches across both local academic and public libraries in pursuit of a particular item. there was a general disregard for where the item came from, as long as it could be acquired with relatively little effort from physically close local or online resources. however, participants reported typically starting with their academic libraries for school resources and the public libraries for leisure materials “i go to the philadelphia public library probably once a month or so usually for dvds but sometimes for books that i can’t find here [academic library]. . . . i usually check here first because it’s closer.” (p5) theme 4: electronic resources, catalog, and searching tools are key there were many participant comments, and some confusion, around what type of resources an internet public library would provide, as well as whether they would be free or not (one participant assumed there would be a fee to read online). the desired resources (in order of importance) included leisure and research e-books, scholarly databases, online magazines and newspapers, and dvds and cds (pointers to where those physical items could be found in local libraries). a few comments were negative, assuming the resources provided would only be electronic, but participants were mostly enthusiastic about the types and breadth of resources that such a website would offer. for example, one participant commented, “i think you could get more resources. . . . the library i usually visit is kind of small so it’s very limited in the range of information you can find.” (p4) many participants emphasized the importance of providing robust, yet easy-to-use, search tools in managing complex information spaces and conveying item availability. theme 5: connections to physical libraries several participants assumed that the resource collection table 2. themes identified research question themes identified what is the public perception of an internet public library? confusion about name quick and easy, but still credible lack of differentiation between public and academic; physical and digital libraries what services and materials would such a website offer? electronic resources, catalog, and searching tools are key connections to physical libraries personal and personalized help the internet public library (ipl) | maceli, wiedenbeck, and abels 21 infosphere—their services and collections both physical and virtual.25 this is, like many issues in library systems design, a complex challenge. as previous research has shown, extending the metaphor of the physical library into the digital environment does not always assist users, especially when they may be more likely to draw on previous experiences with other internet resources.26 the original prospectus for the internet public library, as developed by joe janes, acknowledges the different capabilities of physical libraries and libraries on the internet, claiming that the ipl would “be a true hybrid, taking the best from both worlds but also evolving its own features.”27 if users anticipate an experience similar to the internet resources they typically use (such as search engines), then the ipl may best serve its users by moving closer to “internet” than “library.” however, such a choice may entail unforeseen tradeoffs. several participants in this study mused over what physical public library characteristics would carry over to a digital public library and the potential tradeoffs: “you wouldn’t have to leave your home but at the same time i think it’s easier to wander the library and just see things that catch your eye. and i like the quiet setting of the library too.” (p8) another participant mentioned the distinctly positive public library experience, and how such an experience should be reflected in an internet-based public library: “i think that public libraries have a very positive reputation within communities. and i don’t think it would be bad for an internet public library to move toward that expectation that people have.” (p3) the question remains, then, whether the ipl can compete with a multitude of other internet resources without losing the familiar and positive essence of a traditional physical public library. or rather, how can the ipl find a way to translate that essence to a digital environment without sacrificing performance and user expectations of internet services? ■■ conclusion during this study, participants described an internet public library that, in many ways, takes the best features of several currently existing and popular websites. an internet public library should contain all the information of wikipedia, yet be as credible as information received directly from your local librarian. it should search across both websites and physical holdings, like abebooks.com or a search aggregator. it should search as powerfully and as easily as google, yet return fewer, more targeted results. and it should provide real-time help immediately and conveniently, all from the comfort of your home. out” or occupy leisure time. for the participants, an internet public library (an occasionally confusing term) described a service you could access from home, which included electronic books, information about locally available physical books, scholarly databases, reference or help services, and robust search tools. it must be easy to use and tailored to needy community populations such as children and teens. for several participants it would be similar to existing bookseller websites (such as amazon. com or abebooks.com) or academic library websites. in exploring how these findings can inform the future design and direction of the ipl, it is again necessary to reflect on the values and concepts that inspired the original creation of the ipl. the initial choice of the ipl’s name was intended to reflect a novel system at the time, as joe janes detailed in the ipl prospectus: “i would view each of those three words as equally important in conveying the intent of this project: internet, public, and library. i think the combination of the three of them produces something quite different than any pair or individual might suggest.”23 all three of these concepts—internet, public, and library—have evolved with the changing nature of the internet. and, as the research explored would indicate, there may not be a distinct boundary between these concepts from the perspective of users. our finding that participants seek information by indiscriminately crossing public and academic libraries, as well as digital and physical resource formats verifies earlier research efforts.24 as the amount of information accessible on the internet has expanded, the boundary of the library can be seen as either expanding (providing credible indexing, pointers, and information about useful resources from all over the internet), contracting (primarily providing access to select resources that must be accessed through subscription), or existing somewhere in between, depending on the perspective. in any of these cases, it is vital that the ipl present its resources, services, and offerings such that its value and contribution to information-seeking is highlighted and clear to users. amorphously placed in a complex world of digital and physical information, the ipl must work toward creating a strong image of its offering and mission; an image that is transparent to its users, starting with its name. this challenge is not the ipl’s alone, but rather that of all internet library portals, resources, and services. the 2005 oclc report on perceptions of libraries expressed the importance of a strengthened image for internet libraries: libraries will continue to share an expanding infosphere with an increasing number of content producers, providers and consumers. information consumers will continue to self-serve from a growing information smorgasbord. the challenge for libraries is to clearly define and market their relevant place in that 22 information technology and libraries | march 2011 library,” journal of electronic publishing 3, no. 2 (1997). 8. david s. carter and joseph janes, “unobtrusive data analysis of digital reference questions and service at the internet public library: an exploratory study,” library trends 49, no. 2 (2000): 251–65. 9. on the ipl’s history and funding, see barbara hegenbart, “the economics of the internet public library,” library hi tech 16, no. 2 (1998): 69–83; joseph janes, “serving the internet public: the internet public library,” electronic library 14, no. 2 (1996): 122–26; and carter and janes, “unobtrusive data analysis,” 251–65. on digital reference and ipl’s question-andanswer service, see kenneth r. irwin, “professional reference service at the internet public library with ‘freebie’ librarians,” searcher—the magazine for database professionals 6, no. 9 (1998): 21–23; nettie lagace and michael mcclennen, “questions and quirks: managing an internet-based distributed reference service,” computers in libraries 18, no. 2 (1998): 24–27; sara ryan, “reference service for the internet community: a case study of the internet public library reference division,” library & information science research 18, no. 3 (1996): 241–59; and elizabeth shaw, “real time reference in a moo: promise and problems,” internet public library, http://www.ipl.org/div/iplhist/moo .html (accessed dec. 4, 2008). on the ipl’s resources and collections, see thomas pack, “a guided tour of the internet public library—cyberspace’s unofficial library offers outstanding collections of internet resources,” database 19, no. 5 (1996): 52–56. 10. joseph janes, the internet public library handbook (new york: neal schuman, 1999). 11. carter and janes, “unobtrusive data analysis,” 251–65. 12. gloria j. leckie and lisa m. given, “understanding information-seeking: the public library context,” advances in librarianship 29 (2005): 1–72. 13. james rettig, “reference service: from certainty to uncertainty,” advances in librarianship 30 (2006): 105–43. 14. eileen abels, “information seekers’ perspectives of libraries and librarians,” advances in librarianship 28 (2004): 151–70. 15. ibid., 168. 16. cathy de rosa et al., “perceptions of libraries.” 17. cathy de rosa et al., “college students’ perceptions of libraries.” 18. george d’elia et al., “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 19. anna maria tammaro, “user perceptions of digital libraries: a case study in italy,” performance measurement & metrics 9, no. 2 (2008): 130–37. 20. gwyneth h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” the journal of academic librarianship 28, no. 4 (2002): 205–10. 21. adam feldman, e-mail to author, apr. 3, 2009; mark galloway, e-mail to author, apr. 3, 2009. 22. for information on inductive qualitative analysis, see david r. thomas. “a general inductive approach for analyzing qualitative evaluation data” american journal of evaluation 27, no. 2 (2006): 237–46; michael quinn patton, qualitative research and evaluation methods (thousand oaks, calif.: sage, 2002); these are clearly complex, far-reaching, and labor-intensive requirements. and many of these requirements are currently difficult and unresolved challenges to digital libraries in general, not simply the ipl. this preliminary study is limited in its college student participant base and small sample size, which may not reflect perspectives of the greater community of ipl users. these results therefore may not be generalizable to other populations who are current or potential users of the ipl, including other targeted groups such as children and teens. additionally, our chosen participant group, college students who are physical library users, had relatively high levels of library and technology experience, as well as complex expectations. our results would likely differ with a participant group of novice internet users. as detailed above, this study explores public perceptions of an internet public library—an important aspect of the ipl that is not well studied and that has implications on ipl use and repeat use. while the ipl was carefully and thoughtfully constructed by a dedicated group of librarians, students, and educators, there has not been a recent study devoted to understanding what an internet public library should be today. more recently, in january 2010, the ipl merged with the librarians’ internet index to form ipl2. the two collections were merged and the website was redesigned. although this merger was because of circumstances unrelated to our research, our findings were leveraged during the redesign (for example, in naming the collections). in the future, our findings can be used in further ipl2 design iterations or explored in subsequent research studies in the specific context of ipl2 or of digital libraries in general. as discussed above, this study may be extended to different participant populations and to existing but remote ipl2 users. this study may also be continued in a more design-oriented direction to explore the usability and user acceptance of ipl2’s website. references 1. joseph janes, “the internet public library: an intellectual history,” library hi tech 16, no. 2 (1998): 55–68. 2. “about the internet public library,” internet public library, http://ipl.org/div/about/ (accessed feb. 17, 2009). 3. cathy de rosa et al., “perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www.oclc.org/reports/pdfs/percept_all .pdf (accessed mar. 9, 2009); cathy de rosa et al., “college students’ perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www .oclc.org/reports/pdfs/studentperceptions.pdf (accessed mar. 9, 2009). 4. janes, “the internet public library,” 55. 5. ibid., 56. 6. ibid., 57. 7. lorrie lejeune, “before its time: the internet public the internet public library (ipl) | maceli, wiedenbeck, and abels 23 american society for information science & technology 58, no. 3 (2007): 433–45. 25. de rosa et al., “college students’ perceptions of libraries,” 146. 26. makri et al., “a library or just another information resource?” 434. 27. joseph janes, “the internet public library,” 56. and matthew b. miles and michael huberman, qualitative data analysis: an expanded sourcebook, 2nd ed. (thousand oaks, calif.: sage, 1994). 23. janes, “the internet public library,” 56. 24. for example, stephann makri et al., “a library or just another information resource? a case study of users’ mental models of traditional and digital libraries,” journal of the appendix. interview protocol ■■ have you ever visited a public library? ■❏ if so, how often do you visit and why? ■❏ what services do you typically use? ■❏ can you describe your last visit and what you were looking for? ■❏ what do you think an internet public library would be? ■■ what sort of services would it offer? ■■ what else should it do? ■■ have you ever visited an internet public library? service barometers: using lending kiosks to locate patrons public libraries leading the way service barometers using lending kiosks to locate patrons william yarbrough information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13499 public libraries have been using lending kiosks for close to ten years now. typically, kiosks are used as a sort of satellite collection, delivering library services directly to the community. often, they target people with limited mobility or who lack reliable transportation. reaching these underserved populations helps expand a library’s service area and users, which can factor in to state aid. but when amanda jackson first took over as director for the chesapeake public library, back in 2018, one of her first ideas involved using lending kiosks in a way that’s slightly unconventional. chesapeake is the second largest city in the commonwealth of virginia, measuring at 350 square miles. while considered suburban, the city is also plenty rural, with large areas of farmland, forest, swamps, and river. one of those areas is southern chesapeake. spanning roughly 130 square miles, southern chesapeake stretches all the way down to the border with north carolina. the closest library, however, is located on the northern end of the city, all the way up in great bridge. this library, which houses over 289,000 items (not to mention a law library and history room), is the largest in the city. more than 500,000 people visit it annually. still, for many of the 19,586 res idents in southern chesapeake, it’s a bit of a hike. for this size population, breaking ground on a new library branch would be warranted. especially since that population is growing. from 2018 to 2019, southern chesapeake saw its population increase by 1.62%, good for second highest among the city’s nine boroughs. but even a small library, of, say, 12,000 square feet, would cost $3 million—and that’s being conservative. justifying that big of an expense, both to the city council and local taxpayers, requires proving a return on investment. “building a new library brings a lot of excitement and energy into the community,” jackson says. “but first, to support that decision, we need to better understand what south chesapeake needs from the library.” the first person jackson turned to was maiko medina, who heads up cpl’s it division. like jackson, medina has worked in libraries for close to 20 years, first as frontline staff before transitioning to it. while with the neighboring virginia beach public library system, he helped install a variety of new systems, including self-checkout kiosks. together, medina and jackson came up with a plan, that uses lending kiosks as a type of service barometer. cpl would install kiosks all around south chesapeake, at city parks, community william yarbrough (wyarbrou@infopeake.org) is administrative assistant, chesapeake public library. © 2021. mailto:wyarbrou@infopeake.org information technology and libraries june 2021 service barometers | yarbrough 2 centers, police and fire stations, and local businesses. each kiosk would provide a sel ection of new and popular items from the library’s catalogue, from which patrons could check out on the spot. “by studying how the kiosks are used, we’ll get a better idea for where our patrons are,” medina says. “it’ll also tell us what they’re interested in.” this plan was submitted as a capital project. at a proposed $113,000, the project would fund one initial lending kiosk, along with an accompanying holds locker. chesapeake city council approved this project for fiscal year 2020. installation of the first kiosk in southern chesapeake was then scheduled for 2021. however, when the covid-19 pandemic hit, jackson, medina and the rest of the team at cpl recognized the need to speed the plan into action. “we kept hearing about how kids were struggling with virtual learning,” jackson says, “especially those in our underserved communities.” one of those communities is south norfolk. among the neighborhood’s 22,851 residents, 59.2% identify as black. 31% are 19 or younger. of those children, 39% are enrolled at a title i school. as schools were forced to move learning online, many of these students fell behind, whether because they lacked reliable internet, access to a home computer, or both. to meet this need, the neighboring dr. clarence v. cuffee library was transformed into an outreach and innovation center. along with a business center, maker spaces, stem walls, and a rotating art gallery, the new-and-improved cuffee library came with a student learning center. through this service, students can schedule one-on-one tutoring appointments with library staff and local college students, either virtual or in-person. of course, adding these new services required moving other things around. books, dvds, and other materials were redistributed to other libraries across the system. this may seem like an odd decision (after all, what’s a library without books?). but patrons weren’t using this library for materials; in fact, the number of items checked out from the collection (17,922) was significantly lower than any of the other six branches. still, the library didn’t want to abandon those patrons who rely on cuffee for more traditional services. especially since many were likely stuck at home during the pandemic. to meet this need, last october, shortly after the newly renovated dr. clarence v. cuffee outreach and innovation center opened, rather than wait until 2021, the library secured additional funding, through the cares act, to install a lending kiosk right outside the center’s main entrance. with this kiosk—a lendit 200 (courtesy of d-tech international)—patrons can check out from a rotating list of 200 items. they can also get any other item in the library’ collection by using the holdit locker (also from d-tech). both services are free and are available 24 hours a day, 7 days a week. so far, since last december, 136 items have been checked out through the cuffee lendit kiosk. another 304 have been checked out via the holds locker. among those check-outs, most popular are adult nonfiction dvds (33%) and adult fiction books (20%). as a result, more of these items will be rotated into the kiosk’s collection. not only that, but next month, the library will break information technology and libraries june 2021 service barometers | yarbrough 3 ground on another, bigger lending kiosk. located at fire station 7, this kiosk will be the first lendit 500 installed in south chesapeake. “the lending kiosk has really helped us continue to serve our patrons during all the changes brought on over the past 18 months,” say both jackson and medina. “now that things are starting to move ahead a little, we’re excited for how this technology will help us reach more of chesapeake.” over the next couple of years, chesapeake public library will use these lending kiosks to learn more about what the growing number of people in south chesapeake need from the library. maybe that’s more kiosks, a small storefront or even a full-sized, brick-and-mortar building. either way, new and innovative technologies like the lending kiosks will lead the way, helping cpl deliver services further into the community. beyond viaf: wikidata as a complementary tool for authority control in libraries article beyond viaf wikidata as a complementary tool for authority control in libraries carlo bianchini, stefano bargioni, and camillo carlo pellizzari di san girolamo information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12959 abstract this paper aims to investigate the reciprocal relationship between viaf® and wikidata and their possible roles in the semantic web environment. it deals with their data, their approach, their domain, and their stakeholders, with particular attention to identification as a fundamental goal of universal bibliographic control. after examining interrelationships among viaf, wikidata, libraries and other glam institutions, a double approach is used to compare viaf and wikidata: first, a quantitative analysis of viaf and wikidata data on personal entities, presented in eight tables; and second, a qualitative comparison of several general characteristics, such as purpose, scope, organizational and theoretical approach, data harvesting and management (shown in table 9). quantitative data and qualitative comparison show that viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting, and management. the study highlights the reciprocal role of viaf and wikidata and its helpfulness in the worldwide bibliographical context and in the semantic web environment and outlines new perspectives for research and cooperation. introduction in 2011, the library linked data incubator group, a w3c working group with the aim “to help increase global interoperability of library data on the web,” published its final report. two interrelated issues were tackled in that milestone report: what libraries can do for the semantic web and what the semantic web can do for libraries. linked data is an important asset for libraries as the “use of identifiers allows diverse descriptions to refer to the same thing. through rich linkages with complementary data from trusted sources, libraries can increase the value of their own data beyond the sum of their sources taken individually.”1 so linked data greatly contribute to library cataloguing work not just for description of resources but also for their proper identification. on the other hand, libraries have always created and curated a significant amount of valuable information assets and library authority data for names and subjects to help reduce “redundancy of bibliographic descriptions on the web by clearly identifying key entities that are shared across linked data. this will also aid in the reduction of redundancy of metadata representing library holdings.”2 the report opened a new way of thinking about universal bibliographic control (ubc), a “worldwide system for control and exchange of bibliographic information,” (https://archive.ifla.org/ubcim/ubcim-archive.htm) the purpose of which is “to make universally carlo bianchini (carlo.bianchini@unipv.it) is associate professor, department of musicology and cultural heritage, university of pavia. stefano bargioni (bargioni@pusc.it) is deputy director, library of the pontifical university santa croce (rome). camillo carlo pellizzari di san girolamo (camillo.pellizzaridisangirolamo@sns.it) is graduate student, department of classics, university of pisa and scuola normale superiore. © 2021. https://archive.ifla.org/ubcim/ubcim-archive.htm mailto:carlo.bianchini@unipv.it mailto:bargioni@pusc.it mailto:camillo.pellizzaridisangirolamo@sns.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 2 and promptly available, in a form which is internationally acceptable, basic bibliographic data on all publications in all countries.”3 exchanging information and data requires standards, at both the national and international level, for description, identification, and data format. nowadays, a pillar of ubc is viaf® (the virtual international authority file), a worldwide project designed by a few national libraries and run by oclc, which combines multiple name authority files with the goal “to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the web [https://www.viaf.org/].” it “clusters together the various forms of names for an entity” and has become “a major source for authority control and is becoming the collective reference source at the international level.”4 viaf is a fundamental tool for the identification of entities (people, locations, works, and expressions) relevant for the bibliographic universe. yet, as it is based on the harvesting of data from authoritative national libraries spread all over the world, it has a top-down approach: libraries and services that are not viaf sources can only refer to viaf, but not actively cooperate with it, and, for its nature, viaf cannot admit user cooperation. therefore, on a global scale, a very large number of local libraries are excluded, and their data, collections, and specificities are, too. furthermore, since the design and development of viaf at the beginning of the 21 st century, the semantic web environment has hugely evolved, and libraries are more and more required to act in new directions and to explore new forms of cooperation.5 illien and bourdon maintain not only that libraries “must now be careful to keep up their own interoperability,” but also that they “would be well-advised to keep up or enter into dialogue with the most influential communities in the web of data—smoothing out their own disputes in the meantime.”6 moreover, they believe that “building collaborative authority registries linked to standardized identifiers is one of the fundamental cornerstones of the new universal bibliographic control.”7 also, dunsire and willer suggest that a “smart ubc should strive to support all those who wish to think globally and act locally, with a better mix of bottom-up and top-down methodologies” as far as the “attempts to implement ubc as a worldwide system for the control and exchange of bibliographic information using top-down methodologies have only partially succeeded at global scale.”8 as a result, a better integration of libraries into the semantic web seems to require the involvement of a larger group of stakeholders—such as non-national agencies, museums, archives, and users—and the adoption of a complementary bottom-up approach. a new global actor of the semantic web has both a bottom-up and a very inclusive approach: wikidata. wikidata is a freely available hosted platform that anyone—including libraries—can use to create, publish, and use linked open data (lod). since 2012, many users have been involved in a bottom-up approach to identity management in wikidata. furthermore, interest in and experience with the use of wikidata to publish lod among glam (galleries, libraries, archives, and museums) institutions is constantly increasing.9 the wikidata role as an important tool for the identification of entities of any kind —not just those of traditional importance to glam—has likewise been increasingly recognized in recent years.10 https://www.viaf.org/ information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 3 so, two worldwide identification tools, two different backgrounds, two opposite approaches. are they mutually exclusive, or integrable? is one of them sufficient for libraries’ needs, or do libraries need both? which stakeholders are best served by viaf? which are best served by wikidata? this paper investigates the reciprocal relationship between viaf and wikidata and of their possible specific roles in the semantic web environment with respect to their approach, their domain, and their stakeholders, with particular attention to identification as a fundamental goal of ubc. relationship between viaf and libraries viaf gathers a huge quantity of authority data from more than 50 sources, listed in the home page of the project (https://viaf.org). millions of records coming from national libraries and other institutions are continuously processed using algorithms based on the matching of data and bibliographic relationships with the goal of creating clusters of names (figure 1).11 figure 1. viaf cluster for wolfgang amadeus mozart clusters are usable in many services “to identify names, locations, works, and expressions while preserving regional preferences for language, spelling, and script” (https://www.oclc.org/en/viaf.html). clusters may contain one or more ids from viaf sources. furthermore, unique identifiers of clusters (a viaf id, e.g., https://viaf.org/viaf/7524651/) are freely reusable and reused by other institutions to add useful information to their catalogues, open up new paths of information for the end user, contribute local data to the linked data cloud, and much more.12 https://viaf.org/ https://viaf.org/viaf/32197206/ https://www.oclc.org/en/viaf.html https://viaf.org/viaf/7524651/ information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 4 data sources are selected and approved by the viaf council (see https://www.oclc.org/en/viaf/contributing.html), and may belong to two categories: viaf contributors, usually national lam (libraries, archives, museums) agencies, admitted following very selective criteria; and other data providers, i.e., “other selected sources (e.g., wikipedia [sic]) that are not viaf contributor agencies.”13 other data providers include isni and wikidata (even if wikidata is often confused with wikipedia, as in the quotation above).14 while contributors are eligible to appoint a representative to the viaf council, other data providers are not. so, viaf is based on a rigid three-level hierarchical approach: viaf, viaf contributors, and other data providers. all the other national and local institutions, i.e., relevant national data producers that are no t national agencies, cannot provide data to viaf; instead, they are expected to benefit from the use of viaf ids after performing a reconciliation process of their own data with viaf ids. however, benefits could be not completely satisfactory in term of quality of data: while viaf deals with “widely-used authority files,” it can be supposed that the libraries of non-national agencies need authority data more relevant on a local or specialistic basis. lastly, while viaf guidelines state that viaf participants should periodically send updated data to viaf, it is not clear when and how viaf retrieves and collects data from other data providers (https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf). relationships between wikidata and academic, research, and public libraries wikidata was launched in 2012 by the wikimedia foundation as the central storage of the structured data from all wikimedia foundation projects; it is “a freely available hosted platform that anyone—including libraries—can use to create, publish, and use lod.”15 wikidata stores stable and common information about entities, i.e., items and properties, and interlinks between different wikimedia projects, in a form compliant with the rdf model (see https://www.mediawiki.org/wiki/wikibase/datamodel/primer). additionally, wikidata uses triples and enriches them with qualifiers and references.16 qualifiers allow adding specifications about the validity of a statement (start/end date, precision, obsolescence, series ordinal, etc.); references are fundamental to justify the data, i.e., to document the authority data creator’s reason for choosing the name or form of name on which a controlled access point is based. 17 wikidata uses the software wikibase (https://wikiba.se/), which is “an open-source software suite for creating collaborative knowledge bases” whose “data model prioritizes language independence and knowledge diversity.” the wikibase open-source software, which is currently used by more than thirty institutions, supports federated sparql queries. 18 wikibase’s approach and characteristics are particularly interesting for the library world. gemeinsame normdatei (gnd) created a working group with wikimedia deutschland in order to “debate whether wikibase is suitable for the needs of existing authority files coming from libraries” (https://wiki.dnb.de/display/gnd/authority+control+meets+wikibase); in march 2020 it was stated that the cooperation “has proven successful” and the current aim is to “develop a wikibasebased gnd and put it into use” (https://wiki.dnb.de/pages/viewpage.action?pageid=167019461). similarly, the bibliothèque nationale de france (bnf) and the agence bibliographique de l'enseignement supérieur (abes) launched the joint french national entities file (fne), which in 2019 carried out “a proof of concept to investigate the feasibility of using the software https://www.oclc.org/en/viaf/contributing.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://www.mediawiki.org/wiki/wikibase/datamodel/primer https://wikiba.se/ https://wiki.dnb.de/display/gnd/authority+control+meets+wikibase https://wiki.dnb.de/pages/viewpage.action?pageid=167019461 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 5 infrastructure of wikibase to support the fne.”19 a synthesis of the proof of concept, published in july 2020, mentioned, among the decisions taken, the choice to develop fne to build on wikibase (https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuveconcept-fne.pdf). fne is scheduled to be launched in the next few years (https://f.hypotheses.org/wpcontent/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf ). even more interestingly, between 2017 and 2018, oclc explored a linked data wikibase prototype; the final report shows, among other results, that “the building blocks of wikibase can be used to create structured data with a precision that exceeds current library standards” and that “to populate knowledge graphs with library metadata, tools that facilitate the import and enhancement of data created elsewhere are recommended [. . . and . . .] the pilot underscored the need for interoperability between data sources, both for ingest and export.”20 in late 2019, the ifla wikidata working group was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). on the wikimedia side, in 2019 the ld4-wikidata affinity group (ld4 stands for “linked data for”) was created by hilary thorsen, wikimedian in residence at stanford university, to understand “how the library can contribute to and leverage wikidata as a platform for publishing, linking, and enriching library linked data” (https://wiki.lyrasis.org/display/ld4p2/ld4wikidata+affinity+group). libraries’ interest in wikidata is usually focused on lod and semantic discovery, not on authority control: “libraries may each use different, unique, or select identifiers and authority control methods for disambiguation. increasingly, wikidata is becoming an important tool for synchronizing across identifiers like virtual international authority file (viaf) and orcid identifiers. integrating awareness of wikidata and its uses for enhancing metadata and link ed open data will help advance a more interconnected research web.”21 identification is a key issue both in bibliographic control and in the semantic web environment, as john riemer noted: “recent examination of the efforts involved in what we have historically called authority control in the pcc community has led us to the conclusion that the primary emphasis should be on identity management.”22 as a matter of fact, wikibase and wikidata’s approach to authority control and bibliographic description is quite new: not only does the traditional distinction between authority and bibliographic data disappear in a wikibase description, but wikidata is to be considered firstly as an identity management tool for any kind of entity.23 relationship between viaf and wikidata the first attempt of cooperation between viaf and wikidata goes back to 2012, when maximilian klein and alex kyrios, wikipedians in residence at oclc and the british library, respectively, developed a project to integrate authority data from the viaf with english wikipedia biographical articles. the project successfully “added authority data to hundreds of thousands of articles on the english wikipedia,” but above all showed that “linking of data represents an opportunity for libraries to present their traditionally siloed data, such as catalogue and authority records, in more openly accessible web platforms.”24 at the time, wikidata was taking its first steps, but later authority data were successfully transferred from english wikipedia to wikidata. https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuve-concept-fne.pdf https://www.transition-bibliographique.fr/wp-content/uploads/2020/07/synthese-preuve-concept-fne.pdf https://f.hypotheses.org/wp-content/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf https://f.hypotheses.org/wp-content/blogs.dir/2167/files/2020/02/20200128_8_versunfichiernationaldentites.pdf https://www.ifla.org/node/92837 https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group https://wiki.lyrasis.org/display/ld4p2/ld4-wikidata+affinity+group information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 6 at present, the connection between wikidata and viaf is very strong. both viaf and wikidata are founded on a strict authority control that is built on a few cataloguing principles . in particular, both apply the principle that the authorized access point “for the name of an entity should be recorded as authority data along with identifiers for the entity and variant forms of name.”25 in addition, wikidata is a data provider in viaf, while viaf ids are constantly recorded and updated in wikidata items. at present, wikidata has 8,304,947 personal items, out of which 2,061,046 items have a viaf id. moreover, each month a wikidata bot (https://www.wikidata.org/wiki/user:krbot) updates links in wikidata items to redirected viaf clusters and removes links to abandoned viaf clusters. the relevance of viaf to the wikidata information ecosystem is evident in the visualization of external identifiers in the items: viaf ids, represented on wikidata by property p214 (https://www.wikidata.org/wiki/property:p214), are automatically sorted as the first external identifier, preceded by the group of iso standards and followed by the group of viaf sources.26 using specific gadgets, i.e., enhancements of the edit interface, wikidata registered users can add to a specific item the ids of single viaf sources extracting them from the viaf id(s) present in the item.27 unfortunately, there is no automatic reciprocity between viaf and wikidata: when a wikidata item gets a link to a viaf cluster, viaf does not have an automated way to add a reciprocal link to the wikidata item. likewise, when a viaf cluster gets a link to a wikidata item, wikidata has no automatic way to add a reciprocal link to the viaf cluster. another very important aspect of the viaf-wikidata relationship is that wikidata uploads data from viaf only by voluntary work of wikidata users; and this approach applies to national library data, and to any other data, too. when available, viaf ids are typically one of the most important elements used by users to decide the identity of a wikidata item. wikidata controls on viaf in wikidata, the use of constraints—i.e., rules that check the appropriate use of a property (https://www.wikidata.org/wiki/help:property_constraints_portal)—enables easy discovery of possible inconsistencies in statements, both in data and in external identifiers. weekly, a wikidata bot (https://www.wikidata.org/wiki/user:krbot2) updates the database reports containing the constraint violations for each property, so that users can check the issues and try to fix them. users can also check constraint violations in real time using the appropriate queries linked in the talk page of each property. as far as to viaf ids, two types of constraint-violations are particularly relevant both for the data entry and for the present paper: • “single value” violations, i.e., one item has two or more viaf ids. this means that either one or more viaf ids are not to be related to the item, so that the non-pertinent viaf ids should be removed from the wikidata item or that more viaf ids exist for the same real entity, so that all the existing viaf ids must be kept in the wikidata item until viaf merges them. an example of a merge performed by viaf, maybe on the basis of the correspondent wikidata item, can be found in iulius rufinianus (https://www.wikidata.org/wiki/q28131664), where the eight distinct viaf ids contained in the wikidata item on september 24, 2019, have now been merged (https://www.wikidata.org/w/index.php?title=q28131664&oldid=1001570078); in april 2021, the wikidata item for alaricus i (https://www.wikidata.org/wiki/q102371) contains https://www.wikidata.org/wiki/user:krbot https://www.wikidata.org/wiki/property:p214 https://www.wikidata.org/wiki/help:property_constraints_portal https://www.wikidata.org/wiki/user:krbot2 https://www.wikidata.org/wiki/q28131664 https://www.wikidata.org/w/index.php?title=q28131664&oldid=1001570078 https://www.wikidata.org/wiki/q102371 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 7 four viaf ids (but there were ten on june 29, 2020; https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663). • “unique value” violations, i.e., two or more wikidata items have the same viaf id. this violation means not only an error on the wikidata side, but it could imply an error in viaf too. in the former, either one or more wikidata items have a non-pertinent viaf id, to be removed; or the same entity is referred to by one or more wikidata items, to be merged. in the latter, the viaf id conflates two or more distinct entities in one cluster. an example of conflation is the cluster at https://viaf.org/viaf/57898554/, where the painter herbert e. abrams (1920–2003; https://www.wikidata.org/wiki/q4117019) and the physician herbert l. abrams (1920–2016; https://www.wikidata.org/wiki/q23665535) conflate. in that case, wikidata users can report the viaf conflation error in the proper wikidata errorreport pages.28 in most cases just a few weeks are required for viaf to merge clusters regarding the same entity when wikidata includes them in the same item, but solutions to cases of conflation are fixed more slowly. while updates to viaf clusters and ids are obviously necessary and welcome, they are somehow risky for viaf contributors, providers, and users that base the consistency of their data on viaf. so, national libraries could import incorrect data into their ids and wikidata could import wrong national libraries ids referring to different entities into the same wikidata item. there is no evidence that the error-report pages created and updated by wikidata users are being systematically taken into consideration by viaf to solve its conflations. recently, other issues in the use of viaf as a source were raised when viaf removed very important information about its cluster merging process, information that is no longer available to worldwide libraries and users. the viaf data dump page (http://viaf.org/viaf/data) is refreshed monthly and, until april 2020, it included a persist file. for example, the february 2020 dump, viaf-20200203-persist-rdf.xml.gz, contained data about redirected clusters and—potentially— abandoned clusters as well. this information is essential to the prompt and safe synchronization of local data with viaf clusters. in this dump, redirected clusters were described, for instance, as follows: while any abandoned cluster (14,692,237 out of 24,030,176!) was erroneously described as follows: this xml empty statement omits the specific information about the abandoned cluster. to obtain this invaluable information again, we filed a bug by email. 29 the decision taken was drastic: starting in may 2020, viaf stopped including this information in its monthly dump, as stated at the bottom of the page itself.30 as a result, the only recourse available to viaf contributors or any https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663 https://viaf.org/viaf/57898554/ https://www.wikidata.org/wiki/q4117019 https://www.wikidata.org/wiki/q23665535 http://viaf.org/viaf/data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 8 other institution that would synchronize their authority records with viaf identifiers is to rely on an external identification tool such as wikidata! materials and methods any comparison between viaf and wikidata must consider their different content. viaf contains personal name clusters, corporate name clusters, geographic name clusters, and work clusters, whereas wikidata allows items to describe any kind of entity relevant in the universe of discourse of the users’ data and irrespective of their bibliographic nature. even if all kinds of viaf clusters are relevant for bibliographic control, this study is limited to the analysis of personal name clusters in viaf and of items having “instance of: human” (p31:q5) in wikidata, because they are largely the most represented in viaf and they can be directly compared.31 some entities, such as mythological persons, legendary persons, etc., that are personal clusters in viaf, are not treated as humans in wikidata and belong to other instances (e.g., https://www.wikidata.org/wiki/q95074). a double approach was used to compare viaf and wikidata: first, data analyses of viaf and wikidata were performed, to compare viaf clusters and wikidata items and to investigate their reciprocal relationships (see the data analysis section). second, a comparison of several general characteristics, such as scope, objectives, philosophy, authority control, and identification, was made based on respective websites and available literature to find and highlight differences and similarities. full viaf dumps are available in native xml, rdf, marc-21 xml, or iso-2709 marc-21 (http://viaf.org/viaf/data/). viaf clusters were analyzed using an xml dump published on september 6, 2020 (http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz). full wikidata dumps are available in xml, json, or rdf.32 however, given the size of the entire dataset, it is much more convenient to create customized rdf dumps using the tool wdumper (https://wdumps.toolforge.org/). all the information (settings, dimension, and date of base dump) about dumps created using wdumper remains traced (https://wdumps.toolforge.org/dumps). wikidata items were analyzed using a customized rdf dump updated to september 14, 2020 (https://wdumps.toolforge.org/dump/732). the customized dump contains all statements with non-deprecated values33 present in items having both “instance of: human” (p31:q5) in best rank and at least one value of “viaf id” (p214) in best rank. both dumps were parsed using three perl scripts. dumps and scripts were uploaded on zenodo and are all available for analysis and reuse.34 perl scripts generate json data that are published on the html page http://catalogo.pusc.it/beyond_viaf/, where they are interpreted by javascript scripts in order to populate eight tables: three dedicated to viaf (tables 1–3) and five to wikidata (tables 4–8). in order to select the statements to be analyzed in wikidata items, three sets of relevant properties were found through three distinct sparql queries at the end of september 2020: viaf members (table 5), authority controls related to libraries but not being viaf members (table 6), and biographical dictionaries (table 7).35 at the beginning of october 2020, another sparql query was performed to find all the personal items containing the authority controls related to libraries but not being viaf members (table 6, column 4), without filtering the search to personal items having at least one value of “viaf id” (p214).36 https://www.wikidata.org/wiki/q95074 http://viaf.org/viaf/data/ http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz https://wdumps.toolforge.org/ https://wdumps.toolforge.org/dumps https://wdumps.toolforge.org/dump/732 http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 9 data analysis: viaf clusters and wikidata items for this paper, two different versions of the data tables were produced: the first version, available at http://catalogo.pusc.it/beyond_viaf/, is a full, commented, and dynamic version of all the tables. within that version, links to the acronyms (such as lc, dnb, sudoc, etc.) of all the viaf contributors and other data providers are available too. static versions of these tables are included in this paper with commentary. viaf viaf has 22,099,715 personal clusters, half of which (50.90%; table 1, col. 2) are isolated clusters (i.e., they contain only one id). the presence of isolated clusters is interesting because it means that those clusters are created based on data coming from just one source. what is more, the percentage of isolated clusters is much higher (71.19%; table 1, col. 12) if just viaf contributors are taken into account (i.e., excluding isolated clusters due to data from other data providers, such as isni). it is worth noting that other data providers can form isolated clusters, with the relevant exception of wikidata (for which viaf uses the acronym wkp), which never appears in isolated clusters (table 1, cols. 7 and 8). table 1. viaf personal clusters by number of sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb1] the total number of ids present in viaf clusters is 51,327,847 (table 2), distributed in 22,099,715 clusters; the most relevant contributors include lc (7,266,628 ids), dnb (5,677,731 ids), sudoc (3,278,189 ids), and nta (2,754,036 ids), while the most relevant other data providers are isni (8,455,814 ids) and wkp (2,148,680 ids) (table 2). apart from lc and dnb, data about isolated clusters (table 2, col. 5) shows that the number of isolate clusters tends to slowly decrease over time and that clustering has improved: recently-added sources tend to have a higher share of isolated ids. another relevant figure is that sources in non-latin alphabets usually have higher shares of isolated ids.37 so, a high number of isolated clusters may reveal a source that is partially in need to be gathered to existing clusters. http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 10 table 2. viaf personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb2] the histories of viaf clusters, as contained in xml dumps, appear weird and incoherent. for example, many viaf contributors in their first year of appearance seem to have no additions and many removals (e.g., bav row; for complete information see table 3 on the website at http://catalogo.pusc.it/beyond_viaf/#tb3). incoherence is due to the absence of redirected and abandoned clusters in the data. nevertheless, the histories allow us to reconstruct the year of first contribution of each source—an information otherwise unavailable—and to detect major changes in the data provided to viaf by each source.38 table 3. viaf history of personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb3] wikidata wikidata has 8,304,947 personal items and 2,061,046 of them contain a viaf id. usually one or more viaf sources are extracted from the viaf id(s), so that 1,905,470 personal items containing viaf id have at least one viaf source id (table 4, col. 1). wikidata records ids from a wide range http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb3 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 11 of other resources, such as non-viaf bibliographic agencies and biographical dictionaries (investigated in these tables), but also encyclopedias and various online databases. considering the 2,061,046 items containing a viaf id, 684,367 items contain only one viaf source id (table 4, col. 1), but only 353,710 items contain only one among viaf sources ids and non-viaf sources ids and biographical dictionaries ids (table 4, col. 15); so, more than 300,000 items containing only one viaf source id have at least one non-viaf source id and/or one biographical dictionary id. table 4. wikidata personal items (pers. it.) by number of ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb4] viaf and wikidata: a data comparison from a quantitative perspective, wikidata personal items (8,304,947) are 37.58% of viaf personal clusters (22,099,715), while wikidata personal items having a viaf id (2,061,046) are 9.26%. ids from viaf sources present in wikidata personal items containing viaf id (6,292,778; table 5, col. 3) are 12.91% of ids present in viaf personal clusters (48,740,933; table 5, col. 4). in the authors’ opinion, quantitative confrontation between viaf and wikidata must be carefully considered. it could be argued that is a noticeable disadvantage of wikidata with respect to viaf, but it would be right only from a bibliographic control perspective and the other side of the coin must be examined too. as wikidata represents any kind of entity relevant for its users (libraries, archives, museums, and many other stakeholders), viaf contains just over a third of wikidata items (37%). furthermore, a very large part of the personal entities represented in wikidata (at present, more than 6,200,000, i.e., about 75%) cannot rely on viaf for identification purposes (for example, because wikidata personal items can also represent singers, lawyers, pilots, and so on). it can be concluded that viaf can be considered just one specialized source, in the domain of the semantic web and with respect to the objectives of wikidata. considering single viaf sources, wikidata surpasses viaf by number of ids only in two cases, perseus (135.18%) and simacob (102.17%) (table 5, col. 5). this is possible because wikidata and viaf gather different sets of data from both the sources; the former uses sets of data obtained by its users, while the latter uses only data sent by the contributor. all the other sources, because of the absence of systematic imports, are much rarer in wikidata than in viaf. http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 12 table 5. wikidata personal items (pers. it.) by viaf source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb5] table 6 and table 7 show authority control in wikidata living aside viaf. wikidata contains some non-viaf sources (usually non-national libraries or groups of libraries which couldn’t become viaf contributors); their ids in personal items having viaf id (894,161) are the 86.04% of their ids in all personal items (958,206; table 6, col. 4), meaning that wikidata provides a clusterization for more than 64,000 ids (6%) probably corresponding to non-existent viaf clusters (table 6, totals). http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 13 table 6. wikidata personal items (pers. it.) by non-viaf sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb6] table 7. wikidata personal items (pers. it.) by biographical dictionary [adapted from http://catalogo.pusc.it/beyond_viaf/#tb7] in general the presence of ids of biographical dictionaries (796,609 ids in total) in 725,755 personal items having viaf id helps significantly in the definition of authoritative dates of birth and death (table 7, total of column 2 and table 4, total of column 12). http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 14 a comparison between table 1, column 7, and table 2, row wkp (the acronym for wikidata wrongly used by viaf) shows that 2,147,319 clusters contain 2,148,680 wkp ids; it means that, from a viaf point of view, wikidata duplicates are only 1,361. furthermore, a comparison between the total and row 0 in table 8, col. 1, shows that 2,061,046 items contain at least one viaf id and that 2,037,638 items contain exactly one viaf id; so, items containing one or more viaf duplicates are 23,408. as a result, it can be concluded that the percentage of duplicates in wikidata is less than 0.01% and in viaf is about 0.01%, so wikidata is as trustworthy as viaf. viaf and wikidata not only are able to discover reciprocal duplicates, but also discover duplicates in viaf sources, by a comparison between table 8, col. 3—containing the total number of the cases in which a viaf source has at least one duplicate—and table 8, col. 5—containing the total number of the cases in which viaf sources are duplicated. however, while duplicates recorded by viaf are findable only by querying the monthly dumps using in-house–made programs, duplicates discovered by wikidata are easily findable through sparql queries detecting single-value constraint violations. table 8. wikidata personal items (pers. it.) by repeated viaf sources and viaf source ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb8] discussion viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting and management. a major difference between viaf and wikidata is in their purpose: on the one hand, viaf aims to identify bibliographic entities and to connect authority data provided by selected contributors (national libraries, cultural agencies, and other major institutions) and extracted from other data providers (such as isni, rism or de663, wikidata, etc.) through the creation of clusters by means of software. on the other hand, like isni, wikidata focuses on both identification and description of entities and has the purpose of building collaboratively a database concerning the sum of all relevant knowledge—provided that each item complying with its notability criteria is accepted— using a crowdsourced approach (https://www.wikidata.org/wiki/wikidata:notability). http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 https://www.wikidata.org/wiki/wikidata:notability information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 15 another relevant difference between viaf and wikidata is their scope: while viaf aims to identify a few selected types of entities already described within the bibliographic universe by national agencies, wikidata aims to identify and describe any kind of entity of interest for the wikidata community. wikidata items may exist for any kind of entity and may contain a very broad range of data and of external identifiers. so, wikidata can represent bibliographic data and entities —e.g., at present wikidata records data for the 54% of all the bibliographic sources cited in wikipedia entries—any other kind of entity provided for in viaf (i.e., agents, works, expressions, and places), and any other entity defined by the frbr-ifla lrm model (e.g., manifestations, items, timespans, nomens, res, etc.), and by other models relevant for the glam universe (such as frbroo and cidoc).39 but it is open to any data model because it can also include any kind of entity outside the bibliographic or cultural heritage universe, as it is a knowledge base capable of containing any kind of statement on any entity users want to describe. in addition, for any kind of entity there is no minimum or maximum number of statements that must or can be added; as soon as an entity is clearly identified, it can be added to wikidata. moreover, when miss ing, new identifiers—and properties for description—can be proposed by anyone through property proposals and, if well defined, they are usually approved within two weeks (https://www.wikidata.org/wiki/wikidata:property_proposal). a broader scope is supposed to be much more convenient for users who wish to discover previously unknown links and information in the semantic web. organizational model due to the viaf top-down approach, data is completely managed by oclc with no chance for common users or medium and small libraries or other institutions to directly improve viaf clusters (e.g., by adding other data coming from their collections or from encyclopedias or online databases, merging duplicates, solving conflations, etc.). as the wikidata approach is “to crowdsource data acquisition, allowing a global community to edit the data,” data is curated directly by users interested in their creation and use.40 so, in wikidata, data is produced by volunteers, by means of semiautomatic or manual data harvesting from any desired and available source. moreover, users’ statistics show that authoritative data from national bibliographic agencies and other libraries, archives, and museums are normally uploaded by common users, not by librarians (or any other kind of institutional data curator).41 identification function the theoretical approach differs too, both as to the form of the names and as to identification function. in viaf, preferred and variant forms of names for persons are based on national cataloguing codes. because national codes are different, viaf is needed and works as a neutral hub of all the national preferred forms. cataloguing rules can assure uniformity and univocity to the forms of the names of the entities within a national catalogue but are quite complicated to be understood and used by users. in ranganathan’s words, “the cataloguing conventions are on the surface quite contrary to what mr. everybody is familiar with.”42 in contrast, preferred forms in wikidata are based on the international principles of the convenience of the user and common usage.43 a clear example is the use of the direct form of name (jane doe) instead of the inverted form of name (doe, jane). a different usage in the forms of names could be an issue for the integration of library metadata in wikidata. in practice, however, it is not. first, there is no conflict between the wikidata form and any other form from a theoretical point of view, as wikidata form is already treated in viaf as the preferred form within its specific context.44 in addition to that, wikidata accepts any library https://www.wikidata.org/wiki/wikidata:property_proposal information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 16 identifier, so that any library-controlled form can be linked to a wikidata item and vice versa. furthermore, a wikidata bot could be programmed to dump authorized and variant access points from national authority files and add them to the item labels and aliases. 45 lastly, it could be argued that national cataloguing codes are compliant with the icp principles and with the convenience of the user and common usage. but a remarkable difference is that while in national codes principles are applied by cataloguers for users, in wikidata they are expressed directly by the users themselves. as the identification function is a major feature of the semantic web, the different approach of viaf and wikidata to this issue must be underlined. as noted, “viaf remains neutral towards differences in the cataloguing policy of its data contributors” and, for this reason, viaf accepts all ids provided by its sources, even when they are not clearly identifiable entities but are just labels (see for example https://viaf.org/viaf/307171748 or https://viaf.org/viaf/305052259).46 on the contrary, wikidata explicitly requires each item to refer to “a clearly identifiable conceptual or material entity” (second notability criterium; https://www.wikidata.org/wiki/wikidata:notability). as a consequence, many isolated clusters formed by viaf on the basis of single contributors’ ids related to not-clearly-identifiable entities are not acceptable in wikidata and remain unlinked. moreover, data on cluster duplication shows that identification in wikidata is performed with the same quality level as in viaf. clusters for identification purpose are created both in viaf and wikidata, but differently from viaf, in wikidata external identifiers—as all the other data—are not provided in a structured way by national libraries or other institutions (with very few exceptions); instead, identifiers are usually found and added by common users through web scrapers and after data cleaning. what is more, matches are not performed automatically, but semiautomatically (through tools such as openrefine or mix’n’match (https://mix-n-match.toolforge.org/ and https://openrefine.org/) or manually. an enhanced feature of wikidata in clusterization is the record of a wider variety of sources and relative ids: due to its openness, wikidata refers to viaf and its sources, but also to any other library or cultural institution and to a large number of reference sources like encyclopedias and biographical dictionaries too (table 7). a wider variety of identification sources and manual work assure a higher level of identification. data quantity data harvesting affects both quantity and quality of data. in viaf, data are collected from periodical contributions of viaf participants, with very large sets of data. therefore, from a quantitative point of view, viaf has a far larger number of people (22,099,715 personal clusters) in comparison with wikidata (8,304,947 personal items). even though wikidata was created in 2012, the number of personal items in wikidata is currently only over a third (37%) of all viaf personal clusters. although quantities are not directly comparable due to the different universe to be described, in the last few years initiatives to enhance organized cooperation between libraries and wikidata and to promote data production in wikidata are increasing. a very high-quality initiative is supported by cornell university, harvard university, stanford university, and the university of iowa’s school of library and information science, in collaboration with the library of congress and the program for cooperative cataloging (pcc). their linked data for production (ld4p) wikidata project is “an indepth exploration of how wikidata could serve as a platform for publishing, linking, and enriching library linked data” https://viaf.org/viaf/307171748 https://viaf.org/viaf/305052259/#jones,_a._l https://www.wikidata.org/wiki/wikidata:notability https://mix-n-match.toolforge.org/ https://openrefine.org/ http://catalogo.pusc.it/beyond_viaf/#tb7 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 17 (https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production). an additional example is the ifla wikidata working group that was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). even so, wikidata is still very far from having a structured workflow to ingest data from national or local libraries, museums, and archives. in fact, while the projects mentioned above are mainly dedicated to explaining to the public of librarians and institutions why wikidata is important and how to contribute to it, there are still very few projects which are mainly dedicated to the concrete massive synchronisation of data between library and bibliographic data and wikidata. in fact, they also require a relevant effort in the manual cleaning of discrepancies and oddities emerging from the synchronisation. relevant exceptions are the national library of wales 47 and the biblioteca europea di informazione e cultura, where significant work has been done to synchronise respective databases of authors (and of other types of entities) with wikidata. 48 data quality data quality also needs to be analyzed in detail. even if data from national libraries are authoritative and of high quality, as a virtual file viaf neither has nor produces its own data. consequently, viaf data does not always remain authoritative because errors can be both inherited and added, and clusters can be duplicated. the issue is well known by isni, that “whenever necessary [. . .] splits and merges data coming from viaf, and even applies protection to data that has been fixed manually.”49 as shown in table 2 and table 8, viaf clusters are subject to isolation and duplication when they are created and to many changes and updates when they are maintained. so, even if viaf collects a huge amount of authoritative data and creates clusters of ids, viaf users can not always safely and continuously rely on them. data flows just in one direction (from national libraries to viaf), viaf deletes and rebuilds clusters without giving priority to the stability of one cluster over another, and, after april 2020, viaf no longer makes available to users a record of its changes.50 on the contrary, wikidata data is always under strict control of any user, as its structure is designed to trace any minimum change to its data. every single addition or deletion is documented, not just to easily recover eventual vandalism, but also to support any decision with clear evidence. any stakeholder can exactly know if, how, when, and why data changed, in any moment. what is more, from a qualitative point of view, wikidata seems to offer a better solution for the recording of authority data than viaf. first, it can store a wider variety of data about a person in a more semantic way. not only is it possible in wikidata to express preferred and variant forms of the name, related names, works, co-authors, publication statistics, and other data about the person—like in viaf—but all these data are all expressed in a semantic way. for example, whereas in viaf “bach, anna magdalena” is just a related name of johann sebastian bach, in wikidata she is recorded and qualified as the person who married the musician. thanks to that different approach, wikidata can represent and show bach’s full genealogic tree (https://magnustoolserver.toolforge.org/ts2/geneawiki/?q=q1339). as adamich noted, “building graphs from bibliographic entities is really about making the data machine readable and understandable. it is about making the data web enabled. in terms of translation, linked data opens up a whole new world over our marc entrapment.”51 https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production https://www.ifla.org/node/92837 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 18 quality is enhanced by matching methods too; whereas viaf matches identities by an algorithm based on explicit identifiers or string matching (such as the forms of the name, dates, and bibliographic relationships),52 wikidata matches are usually decided by a human, the user, or (in the case of semiautomatic imports) at least checked a posteriori by a human after some time. the higher precision of manual over automatic matching is recognized also in viaf guidelines. 53 furthermore, as seen above, notability requires that, when clear identification is impossible, no item must be created in wikidata. data maintenance and usability data quality relies also on maintenance. comparison between wikidata items and viaf clusters shows a very small but constant presence of errors to be fixed in both (around 0.01%), even if it is impossible to determine with certainty whether viaf uses wikidata error pages. issues on fixing viaf errors directly by viaf contributors were already noted: “while clustering anomalies can be handled by viaf itself, reporting errors found in source data of viaf partners raise problems related to the efficiency of the notification workflows. at this point, involvement of viaf partners themselves in the process is needed.”54 on the other hand, in wikidata anyone can edit items, add new data or delete mistakes, merge items, fix various issues, and so on, on the fly. due to its openness, wikidata may also suffer from vandalism, but it has its own solutions.55 along with this, data receive special attention to their accuracy and reliability because they are uploaded and maintained by users that are direct stakeholders. for this reason, in wikidata, references to bibliographical or biographical sources and to other data provider ids such as any national and international identification system are suggested, promoted, and carefully examined. moreover, there is a commitment to monitor the consistency of viaf clusters. the ability of wikidata to identify inconsistent viaf clusters and the fact that viaf isolated clusters can be reduced at least by 30%56 by referring to identifiers from wikidata and other data providers, are the best demonstration of the quality of its data and of the importance of the other data providers in viaf clusterization. as to the usability of data, the internal search of viaf lacks more than basic functions: the only available filter allows to limit results to clusters having one specific source; on the contrary, filtering searches for clusters having and/or not having a specific group of sources or to clusters having more or less sources would be very useful, especially in order to find duplicates. in contrast, wikidata has a sparql query service which returns results based on the current status of the database and its internal search can integrate some of the functions of the query service, allowing to look for items having and/or not having specific statements (https://www.wikidata.org/wiki/special:search).57 considering cases in which viaf and wikidata discover potential duplicates in their sources, viaf has no page dedicated to listing cases of (supposedly) duplicate ids from its sources, while wikidata easily allows to find cases in which single sources have (supposedly) duplicate ids through constraint violations58 and appropriate sparql queries. a comparison table a comparison table was built to compare scope, role, system, and functions between viaf and wikidata, inspired by and adapted from a viaf vs isni comparison.59 https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 19 table 9. comparison between and complementarity of viaf and wikidata features feature viaf wikidata scope ● persons ● organizations ● works ● expressions ● locations ● any kind of viaf entity ● any “res” of ifla lrm ● any entity of cidoc ● any other non-glam entity ● any entity in the universe of discourse software ● unknown ● wikibase60 data. person entity properties ● preferred form of name, based on national cataloguing rules ● very rich variant forms of name, identified by national agencies variant forms ● sources ● preferred form of name (label) based on convenience of the user and common usage61 ● variant forms of name (aliases), organized by languages and scripts62 ● sources (as statements and references and with qualifiers) data. quantity (persons) ● number of clusters: 33,656,281 (sept. 2020) ● number of personal clusters: 22,099,715 (sept. 2020) ● number of entities: 90,260,081 (oct. 2020) ● number of personal items: 8,304,947 (oct. 2020) ● number of personal items with viaf id: 2,061,046 (sept. 2020) data. harvesting ● data are provided by authoritative national bibliographic agencies ● data are added through massive semiautomatic imports and/or manually by any interested user data. quality ● data are granted by authoritative national bibliographic agencies ● data are controlled by any directly interested user, based on data from viaf, available bibliographic agencies, and other authoritative bibliographic sources data. other entities properties ● isbn, titles, dates included in the cluster ● any kind of property applicable to an entity can be used (multimedia included)63 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 20 feature viaf wikidata ● dates, genre, bibliographic references from sources, xlinks, etc. ● properties are unchangeable ● all statements admit references, which are strongly recommended in some cases ● unavailable properties can be freely added through a process of property proposal64 data. dates ● dates are extracted from authority and bibliographic records using a parsing technique; calendars and precision are not available65 ● dates are imported semiautomatically from various sources or filled in manually; different calendars are available and further statements can be made through qualifiers66 data. vandalism ● no vandalism: data are editable only by oclc ● everyone can edit, but items which are frequently vandalized can be temporarily or permanently protected from the edits of unregistered users67 data. fixing errors, deduplicating, or unmerging clusters/items ● suggestions and requests via email ● asynchronous ● presumably, automated processes and human interventions ● viaf rebuilds clusters and does not give priority to the stability of one cluster over another68 ● everyone can edit69 ● instantaneous ● probable errors (constraintviolations) are detected in an automated way (by bots and through queries) ● pages with lists of probable errors (constraint-violations) are freely available and constantly updated in an automated way (by bots)70 data. license ● all public data (license: http://opendatacommons.org/licen ses/by/1.0/) ● all public data (license: https://creativecommons.org/publi cdomain/zero/1.0/deed.it) role ● create clusters ● ingest authority records from viaf contributors and other data providers (included wkd and isni) ● publish and diffuse viaf ids and data ● create items with a worldwide recognized and standard identifier ● interlink items with any available external identifier ● ingest data from viaf, from viaf contributors, and other data providers (e.g., isni) http://opendatacommons.org/licenses/by/1.0/ http://opendatacommons.org/licenses/by/1.0/ https://creativecommons.org/publicdomain/zero/1.0/deed.it https://creativecommons.org/publicdomain/zero/1.0/deed.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 21 feature viaf wikidata ● allow to create and maintain on toolforge free tools—e.g., mix’n’match—to ingest external identifiers71 ● manage library, bibliographic, and non-library and non-bibliographic linked data ● publish and diffuse wikidata ids and data organizational model ● oclc service, guided by viaf council of participating institutions ● hierarchical, top-down ● membership on request and subordinated to approval ● largely limited to national bibliographic agencies ● wikimedia project ● distributed, bottom-up ● everyone can take part in the project72 ● open to any bibliographic or nonbibliographic institution (national, large, medium, and small) system. website ● interface only in english language ● interface in nearly any language and script; new ones can be added ● online facilities (end user input; edit online facilities for end user) ● login enhances users’ experience (by gadgets and scripts) system. updating ● periodical (asynchronous) ingestions ● continuous, instantaneous, free updates system. versioning ● history is included in each present cluster and for abandoned clusters ● history is inaccessible in redirected clusters ● page history available in each item and for redirected items ● for deleted items, history is accessible only to administrators long-term preservation policy ● oclc maintains the hosting, software, and data for viaf73 ● wikimedia foundation maintains the hosting, software, and data for wikidata74 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 22 feature viaf wikidata notifications to stakeholders ● notifications to be sent to data providers ● notifications are sent to end users and contributors display, search, and download ● in multiple formats: xml and json, including justlinks.json; ● basic search interface ● clusters are listed without clear ranking rule ● integrating monthly dumps ● api endpoint75 ● before april 2020, by monthly dump with persist links; after, monthly dumps without persists links ● in multiple formats: json, php, n3, ttl, nt, rdf, jsonld, html76 ● search interface 77 ● api endpoint78 ● sparql query endpoint79 ● dumps80, also customizable81 ● see https://www.wikidata.org/wiki/help :about_data linked data and sru ● linked data ● sru82 (search and browse indexes, using cql syntax; output formats are xml or html) ● linked data interoperability. local ● local institution can only reconcile viaf ids to their own data ● as changes are made by viaf, synchronization must be periodically performed by sources and local institutions ● full reconciliation, upload, and synchronization of local ids on wikidata and vice versa ● dedicated tools: mix’n’match ● other tools: openrefine ● bots ● manually conclusion main viaf and wikidata features and personal entities data were analyzed and compared in this study to focus on analogies and differences, and to highlight their reciprocal role and helpfulness in the worldwide bibliographical context and in the semantic web environment. viaf is a major international initiative to address the challenge of reliably identifying bibliographic agents on the web, by means of authoritative data based on national cataloguing codes and coming from the national libraries involved in the ubc program. moreover, viaf is a pillar of the identification process that users enact within wikidata. still, the comparison emphasized a few relevant issues in viaf’s approach, designed more than twenty years ago: a very selective policy of inclusion of its sources—contributors and other data providers—and to their participation to the governance, that prevents a worldwide openness of the project to non national libraries and cultural institutions; an obvious neutrality toward data coming from its https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 23 contributors, even when data are not compliant with the identification requirements of the semantic web; troubles in correct clustering of ids (duplicate clusters to be merged and conflated clusters to be split), and a one-way flow of data due to its top-down approach that prevents a quick and cooperative workflow to identify and fix errors; the ability to identify only a narrow range of entities (i.e., mainly bibliographic entities, but not even all those provided by ifla lrm). on the other side, the semantic web has offered new important tools and chances to libraries, archives, museums and other cultural institutions, and their data are recognized as a relevant asset for building the backbone of the semantic web as to the control of entities of bibliographic and cultural interest. after eight years of existence, wikidata is playing a relevant role in the publication, aggregation, and control of bibliographic and non-bibliographic information in the semantic web too. it is more and more indicated as a hub for identifiers in the semantic web.83 wikidata depends on viaf for a large part of the identification work of its items on viaf and viaf’s preeminent role in wikidata is acknowledged by its primary position in the identifiers section of the data of each item. for this reason, the wikidata community constantly monitors the consistency of viaf clusters and continuously updates lists of errors present in them . on the other hand, if viaf is undoubtedly very useful to the wikidata community, wikidata can support the consistency of viaf clusters. the wikidata informational ecosystem is much larger and wider, can be built by any interested institution and person, and its identification function can count also on the authority work of national and non-national libraries excluded from the viaf environment, and on authoritative non-bibliographical reference sources too. this study opens some research perspectives. analysis was limited to data about personal entities, as this kind of entity was the only one directly comparable, while further research is wanted to possibly extend the analysis to other kinds of entities. moreover, more research should be devoted to the investigation of the treatment of special categories of persons and their names, such as mythological and legendary characters, ancient greek and latin authors, kings, queens, popes, saints, and so on, as viaf guidelines84 themselves declare among viaf’s typical problems the clusterization of such names (and they often get five or more viaf ids in wikidata). a further line of research should consider the relevance of the clusterization of encyclopedias and other reference sources in the identification process within wikidata. lastly, isolated clusters would need more consideration; as a matter of fact, in this study they were used as a clue of relatively recent uploads in viaf, but lc and dnb show a high rate of isolated clusters too (maybe due to the richness of their collections and metadata). more research on isolated clusters could help to describe with more precision the possible role of non-national libraries and institutions and of their locally rich collections in identifying lesser-known agents (not just persons) in a worldwide perspective. from analyzed data and direct comparison, it can be concluded that viaf and wikidata can be constantly improved through reciprocal comparison, which allows discovery of errors in both. viaf and wikidata are two relevant tools for the authority control in the semantic web and they each have a specific role to play and different stakeholders. unfortunately, as opposed to the relationship between viaf and isni, at present no aspect of viaf-wikidata interoperability is discussed between the managing structures of both systems, on a regular or irregular basis . while wikidata appears to be more reliable with regards to the identification process, its most significant weakness consists in its unorganized and unplanned crowdsourced data acquisition, information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 24 even if based at present on about 11,500 active editors.85 furthermore, the wikidata community still lacks the constant support and cooperation of institutional data curators such as librarians, archivists, and museum curators. many current projects are mainly dedicated to explaining to the potential institutional stakeholders the importance and the usefulness of wikidata for their institutional missions, but there are still too few projects devoted to massive synchronization of data from institutional silos to wikidata. but, as soon as these initiatives reach a critical mass, wikidata will become the real global hub of the web of data. acknowledgements all the authors have cooperated in the redaction and revision of the article. nevertheless, each author has mainly authored specific sections and subsections of the article: • stefano bargioni: data analysis; viaf; wikidata; viaf and wikidata: a data comparison. • carlo bianchini: introduction; discussion; organizational model; identification function; data quantity; data quality; data maintenance and usability. • camillo carlo pellizzari di san girolamo: relationship between viaf and libraries; relationship between wikidata and academic, research, and public libraries; relationship between viaf and wikidata; wikidata controls on viaf; materials and methods; conclusion. all authors contributed to a comparison table. the authors wish to thank the anonymous reviewer whose suggestions helped to improve and enrich the paper, and the editor for his helpful edits. information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 25 endnotes 1 thomas baker et al., library linked data incubator group final report, sec. 2 (w3c incubator group, october 25, 2011), http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/. 2 baker et al., library linked data. 3 dorothy anderson, universal bibliographic control. a long term policy—a plan for action (munchen: verlag dokumentation, 1974), 11. 4 anila angjeli, andrew mac ewan, and vincent boulet, “isni and viaf: transforming ways of trustfully consolidating identities,” in ifla wlic 2014 (ifla 2014 lyon, ifla, 2014), 2, http://library.ifla.org/985/1/086-angjeli-en.pdf. 5 rick bennett et al., “viaf (virtual international authority file): linking the deutsche nationalbibliothek and library of congress name authority files,” international cataloguing and bibliographic control 36, no. 1 (2007): 12–18; barbara b. tillett, the bibliographic universe and the new ifla cataloging principles : lectio magistralis in library science = l’universo bibliografico e i nuovi principi di catalogazione dell’ifla : lectio magistralis di biblioteconomia (fiesole (firenze): casalini libri, 2008), 14–15, http://digital.casalini.it/9788885297814; “viaf. connect authority data across cultures and languages to facilitate research,” oclc, 2020, https://www.oclc.org/en/viaf.html. 6 gildas illien and françoise bourdon, “a la recherche du temps perdu, retour vers le futur: cbu 2.0” (paper, ifla wlic 2014, lyon, france, 2014), 13–14, http://library.ifla.org/956/. 7 illien and bourdon, “a la recherche,” 15. 8 gordon dunsire and mirna willer, “the local in the global: universal bibliographic control from the bottom up” (paper, ifla wlic 2014, lyon, france, 2014), 11, http://library.ifla.org/817/. 9 luca martinelli, “wikidata: la soluzione wikimediana ai linked open data,” aib studi 56, no. 1 (march 2016): 75–85, https://doi.org/10.2426/aibstudi-11434; jesús tramullas, “objetos culturales y metadatos: hacia la liberación de datos en wikidata,” anuario thinkepi 11 (2017): 319–21, https://doi.org/10/ghbj63; xavier agenjo-bullón and francisca hernández-carrascal, “wikipedia, wikidata y mix’n’match,” anuario thinkepi 14 (2020), https://doi.org/10/ghbj6t; claudio forziati and valeria lo castro, “the connection between library data and community participation: the project share catalogue-wikidata,” jlis.it 9, no. 3 (2018): 109–20, https://doi.org/10/ggxj9n; adrian pohl, “was ist wikidata und wie kann es die bibliothekarische arbeit unterstützen?,” abi technik 38, no. 2 (2018): 208, https://doi.org/10/ghbj6w; arl white paper on wikidata: opportunities and recommendations (the association of research libraries, 2019), https://www.arl.org/wpcontent/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf; regine heberlein, “on the flipside: wikidata for cultural heritage metadata through the example of numismatic description” (paper, ifla wlic 2019, libraries: dialogue for change, session 206: art libraries with subject analysis and access, athens, greece, august 28, 2019), http://library.ifla.org/2492/1/206-heberlein-en.pdf. 10 arl white paper on wikidata, 27–30; theo van veen, “wikidata: from ‘an’ identifier to ‘the’ identifier,” information technology and libraries 38, no. 2 (2019): 72–81, http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/ http://library.ifla.org/985/1/086-angjeli-en.pdf http://digital.casalini.it/9788885297814 https://www.oclc.org/en/viaf.html http://library.ifla.org/956/ http://library.ifla.org/817/ https://doi.org/10.2426/aibstudi-11434 https://doi.org/10/ghbj63 https://doi.org/10/ghbj6t https://doi.org/10/ggxj9n https://doi.org/10/ghbj6w https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf http://library.ifla.org/2492/1/206-heberlein-en.pdf information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 26 https://doi.org/10/ghbj62; hilary thorsen, “ld4p: linked data for production: wikidata as a hub for identifiers” (slideshow presentation, june 11, 2020), https://docs.google.com/presentation/d/1jwz3_ncf5rdd7ejetglfv99uv2pnd1v/edit?usp=embed_facebook. 11 tillett, the bibliographic universe, 15. 12 open data commons attribution license (odc-by) v1.0 (as stated in http://viaf.org/viaf/data/). 13 “viaf admission criteria,” oclc, 2020, https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf. 14 the description of wikidata source in http://viaf.org/viaf/partnerpages/wkp.html seems to refer to wikipedia before the existence of wikidata. the same acronym wkp reflects this anachronism, whereas isni correctly uses wkd. anyway, this description, as well as many others, requires an update. 15 stacy allison-cassin and dan scott, “wikidata: a platform for your library’s linked open data,” code4lib journal 40 (may 4, 2018), https://journal.code4lib.org/articles/13424. 16 carlo bianchini and pasquale spinelli, “wikidata at fondazione levi (venice, italy): a case study for the publication of data about fondo gambara, a collection of 202 musicians’ portraits,” jlis.it 11, no. 3 (september 15, 2020): 24. 17 ifla working group on functional requirements and numbering of authority records (franar), functional requirements for authority data: a conceptual model (münchen: k. g. saur, 2009), 46, https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf. for qualifiers, see https://www.wikidata.org/wiki/help:qualifiers; for references see https://www.wikidata.org/wiki/help:sources. 18 partial lists are linked from https://wikibase-registry.wmflabs.org/wiki/main_page. 19 see https://www.transition-bibliographique.fr/fne/french-national-entities-file/; the proof of concept is available at https://github.com/abes-esr/poc-fne. 20 jean godby et al., creating library linked data with wikibase: lessons learned from project passage (dublin oh: oclc research, 2019): 8, https://doi.org/10.25333/faq3-ax08. 21 ifla, “opportunities for academic and research libraries and wikipedia” (discussion paper, 2016), 10, https://www.ifla.org/files/assets/hq/topics/infosociety/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf. 22 john riemer, “the program for cooperative cataloging & a wikidata pilot” (slideshow presentation, june 16, 2020), slide 5, https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmy xffi/edit#slide=id.p. 23 godby et al., “creating library linked data,” 8. https://doi.org/10/ghbj62 https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook http://viaf.org/viaf/data/ https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf http://viaf.org/viaf/partnerpages/wkp.html https://journal.code4lib.org/articles/13424 https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf https://www.wikidata.org/wiki/help:qualifiers https://www.wikidata.org/wiki/help:sources https://wikibase-registry.wmflabs.org/wiki/main_page https://www.transition-bibliographique.fr/fne/french-national-entities-file/ https://github.com/abes-esr/poc-fne https://doi.org/10.25333/faq3-ax08 https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 27 24 maximilian klein and alex kyrios, “viafbot and the integration of library data on wikipedia,” code4lib journal 22 (october 14, 2013), https://journal.code4lib.org/articles/8964. 25 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp) (den haag: ifla, 2016), para. 5.3. 26 https://www.wikidata.org/wiki/mediawiki:wikibasesortedproperties#ids_with_datatype_%22external-id%22; isni (p213, https://www.wikidata.org/wiki/property:p213) is presently sorted after viaf instead of in the iso section because it is considered primarily as a viaf source. 27 epìdosis, viaf e wikidata.mpg, 2020, https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg; a list of gadgets is available at https://www.wikidata.org/wiki/wikidata:viaf/cluster#gadgets. 28 the main error-report page is https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities; its subpage https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries is designed for collecting “easy” cases of conflation, when only a few members of a cluster should be moved elsewhere, while the cluster is substantially sane. 29 moreno hayley, email to author, march 23, 2020. to the question if data about abandoned clusters would have been maintained, the viaf answered, “we recognize that the data in the file was not usable. viaf is in a period of transition and it was decided that we could not at this time fix the file so it has been removed from the list of available downloads.” 30 the statement read: “the persist-rdf.xml file has been removed and will no longer be available,” accessed october 23, 2020. 31 angjeli, mac ewan, and boulet “isni and viaf,” 3. 32 https://dumps.wikimedia.org/wikidatawiki/; instructions and a list of kinds of data dumps are available at https://www.wikidata.org/wiki/wikidata:database_download. 33 a general explanation of ranks is available at https://www.wikidata.org/wiki/help:ranking. here is a small summary: values of statements can be ranked in three ways, “preferred,” “normal” (default), and “deprecated”; the expression “values with non-deprecated rank” includes all values with preferred rank or normal rank; the expression “values with best rank” includes only values with preferred rank or normal rank, with this condition: if the same statement has two or more values and at least one of them has preferred rank, values with normal rank aren’t counted; if there aren’t values with preferred rank, all values with normal rank are counted. 34 viaf and wikidata dumps, together with the scripts, were published on zenodo at https://doi.org/10.5281/zenodo.4457114. https://journal.code4lib.org/articles/8964 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/property:p213 https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg https://www.wikidata.org/wiki/wikidata:viaf/cluster%23gadgets https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries https://dumps.wikimedia.org/wikidatawiki/ https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/help:ranking https://doi.org/10.5281/zenodo.4457114 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 28 35 the queries can be performed using the following links: viaf members: https://w.wiki/i5j; authority controls related to libraries but not being viaf members: https://w.wiki/i5k; biographical dictionaries: https://w.wiki/i5n. 36 the query can be performed using the following link: https://w.wiki/i5p. 37 it could be because they are probably more difficult to cluster, but in some cases also because they represent infrequently described entities. 38 as suggested by the reviewer, more removals than additions may be a clue of a cleanup project. 39 pat riva, patrick le boeuf, and maja zumer, ifla library reference model, draft (den haag: ifla, 2017), https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf; nick crofts et al., “definition of the cidoc conceptual reference model,” version 5.0.4, icom/cidoc crm special interest group, 2011, http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html; chryssoula bekiari et al., eds., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (international working group on frbr and cidoc crm harmonisation, 2013), http://old.cidoccrm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf; lydia pintscher, lea lacroix, and mattia capozzi, “what’s new on the wikidata features this year,” youtube video, october 26, 2020, truocolo, https://www.youtube.com/watch?v=ebxdzk54gru. 40 denny vrandečić and markus krötzsch, “wikidata: a free collaborative knowledgebase,” communications of the acm 57, no. 10 (september 23, 2014): 80, https://doi.org/10/gftnsk. 41 for a general statistic see http://wikidata.wikiscan.org/users; for a statistic about the viaf property see https://bambots.brucemyers.com/navelgazer.php?property=p214; changing the id of the property at the end of the url allows exploring other property statistics. 42 shiyali ramamrita ranganathan, reference service, 2nd ed., ranganathan series in library science 8 (bombay: asia publishing house, 1961), 74. 43 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5, https://www.ifla.org/publications/node/11015. 44 wikidata does have a guideline for a preferred label, and its choice is based on users’ convenience (https://www.wikidata.org/wiki/help:label, par. 1.2) as required by international cataloguing principles (2016). as to the choice of the wikidata label in a specific language, viaf does not show any clear principle, while the authors believe that it would be preferable to use the english (“en”) label, whenever available. see ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp). 45 for example, in september it was done for nkc using openrefine (sample edit: https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=12668704 64). https://w.wiki/i5j https://w.wiki/i5k https://w.wiki/i5n https://w.wiki/i5p https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf https://www.youtube.com/watch?v=ebxdzk54gru https://doi.org/10/gftnsk http://wikidata.wikiscan.org/users https://bambots.brucemyers.com/navelgazer.php?property=p214 https://www.ifla.org/publications/node/11015 https://www.wikidata.org/wiki/help:label https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 29 46 angjeli, mac ewan, and boulet, “isni and viaf,” 9. 47 simon cobb (https://www.wikidata.org/wiki/user:sic19) became wikidata visiting scholar in 2017 (https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar). 48 federico leva and marco chemello, “the effectiveness of a wikimedian in permanent residence: the beic case study,” jlis.it 9, no. 3 (september 2018): 141–47, https://doi.org/10.4403/jlis.it-12481. 49 angjeli, mac ewan, and boulet, “isni and viaf,” 11. 50 andrew mac ewan, “isni, viaf and naco and their relationship to orcid, discussion paper for pcc policy committee, 4 november,” 2013, 2, http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.d ocx. 51 tom adamich, “library cataloging workflows and library linked data: the paradigm shift,” technicalities 39, no. 3 (may/june 2019): 14. 52 oclc, viaf guidelines, rev. july 16, 2019, 2, https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 53 oclc, viaf guidelines, 5. “when viaf is unable to algorithmically match some of the source authority records with each other, they can be manually pulled together into a single cluster using an internal table.” 54 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 55 stefan heindorf et al., “vandalism detection in wikidata,” in proceedings of the 25th acm international conference on information and knowledge management, cikm ’16 (new york, ny: association for computing machinery, 2016), 327–36, https://doi.org/10/gg2nmm; amir sarabadani, aaron halfaker, and dario taraborelli, “building automated vandalism detection tools for wikidata,” in proceedings of the 26th international conference on world wide web companion, www ’17 companion (republic and canton of geneva, che: international world wide web conferences steering committee, 2017), 1647–54, https://doi.org/10/ghhtzf. 56 see table 1, col. 1 vs col. 9; it should be noted that col. 9 considers only non-viaf sources and biographical dictionaries, but wikidata also links to encyclopedias and other online databases. 57 for example, people not having viaf id but having iccu id (https://tinyurl.com/y6hbtjuo); instructions about the internal search are available at https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch. 58 https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations. 59 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 60 https://www.mediawiki.org/wiki/wikibase/datamodel. https://www.wikidata.org/wiki/user:sic19 https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar https://doi.org/10.4403/jlis.it-12481 http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://doi.org/10/gg2nmm https://doi.org/10/ghhtzf https://tinyurl.com/y6hbtjuo https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.mediawiki.org/wiki/wikibase/datamodel information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 30 61 “the label is the most common name that the item would be known by” (https://www.wikidata.org/wiki/help:label). see also ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5., https://www.ifla.org/publications/node/11015. 62 bots exist to create more and more variant forms based on matching properties, such as date of birth (p569) and date of death (p570), and to import variant forms of names from national authority files. see, for example, https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 . 63 https://www.wikidata.org/wiki/help:data_type. 64 https://www.wikidata.org/wiki/wikidata:property_proposal. 65 jenny a. toves and thomas b. hickey, “parsing and matching dates in viaf,” code4lib journal, 26 (october 21, 2014), https://journal.code4lib.org/articles/9607; stefano bargioni, “from authority enrichment to authoritybox : applying rda in a koha environment,” jlis.it 11, no. 1 (2020): 175–89, https://doi.org/10/gg66rq. 66 https://www.wikidata.org/wiki/help:dates. 67 see heindorf et al., “vandalism detection in wikidata.” 68 see mac ewan, “isni, viaf and naco.” 69 see https://www.wikidata.org/wiki/help:merge, https://www.wikidata.org/wiki/help:split_an_item, and https://www.wikidata.org/wiki/help:conflation_of_two_people. 70 complete list at https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations (e.g., https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214). 71 https://admin.toolforge.org/; see also xavier agenjo-bullón and francisca hernándezcarrascal, “registros de autoridades, enriquecimiento semántico y wikidata,” anuario thinkepi 12 (2018): 361–72, https://doi.org/10/ghbj6z. 72 https://www.wikidata.org/wiki/wikidata:property_proposal. 73 https://www.oclc.org/en/viaf.html. 74 https://www.wikidata.org/wiki/wikidata:introduction. 75 https://platform.worldcat.org/api-explorer/apis/viaf. 76 https://www.wikidata.org/wiki/special:entitydata; see also https://www.wikidata.org/wiki/wikidata:database_download. 77 https://www.wikidata.org/wiki/special:search. https://www.wikidata.org/wiki/help:label https://www.ifla.org/publications/node/11015 https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 https://www.wikidata.org/wiki/help:data_type https://www.wikidata.org/wiki/wikidata:property_proposal https://journal.code4lib.org/articles/9607 https://doi.org/10/gg66rq https://www.wikidata.org/wiki/help:dates https://www.wikidata.org/wiki/help:merge https://www.wikidata.org/wiki/help:split_an_item https://www.wikidata.org/wiki/help:conflation_of_two_people https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214 https://admin.toolforge.org/ https://doi.org/10/ghbj6z https://www.wikidata.org/wiki/wikidata:property_proposal https://www.oclc.org/en/viaf.html https://www.wikidata.org/wiki/wikidata:introduction https://platform.worldcat.org/api-explorer/apis/viaf https://www.wikidata.org/wiki/special:entitydata https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 31 78 https://www.wikidata.org/w/api.php. 79 https://query.wikidata.org/. 80 https://dumps.wikimedia.org/wikidatawiki/. 81 https://wdumps.toolforge.org/. 82 https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html. 83 van veen, “wikidata.” 84 see “typical problems” in viaf guidelines: https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 85 pintscher, lacroix, and capozzi, “what’s new.” https://www.wikidata.org/w/api.php https://query.wikidata.org/ https://dumps.wikimedia.org/wikidatawiki/ https://wdumps.toolforge.org/ https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf abstract introduction relationship between viaf and libraries relationships between wikidata and academic, research, and public libraries relationship between viaf and wikidata wikidata controls on viaf materials and methods data analysis: viaf clusters and wikidata items viaf wikidata viaf and wikidata: a data comparison discussion organizational model identification function data quantity data quality data maintenance and usability a comparison table conclusion acknowledgements endnotes gathering strength to combat access inequality: how a small rural public library supported virtual access for public school students, staff, and their families public libraries leading the way gathering strength to combat access inequality how a small rural public library supported virtual access for public school students, staff, and their families julie lane information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15161 julie lane (jlane@peclibrary.org) is technology resource centre coordinator and educational resource consultant, county of prince edward public library and archives. © 2022. prince edward county (pec) is located east of toronto and covers approximately 1,050 square kilometers. pec is a part of the hastings prince edward district school board (hpedsb) and have a total of 6 public schools, one catholic school, and one private school. the other county serviced by our school board is hastings county. the county of prince edward public library (cpepl) system of 6 branches services just under 25,000 residents and countless seasonal visitors during the tourism season. our public school board services approximately 15,000 students across 7,220 square kilometers and 39 in-person schools and a k-10 virtual school across the two counties. starting off a technology column with a bunch of statistics is not exactly how i figured i would write this. however, context is key when discussing equity and access; and in this piece, i intend to highlight how both of those are made significantly easier to achieve for community stakeholders, with the presence of technology and education. when the stay-at-home orders were announced in march 2020 due to the covid-19 pandemic, we knew that we would not be able to hold our scheduled and planned public library programs. we turned to live streaming story times, maker programs, and author visits, all using what equipment we had on hand—tablets, laptops, and the internet. once it became clear that students in the public schools would not return to in-person learning within any short amount of time, all school boards in ontario ensured that enough chromebooks were purchased so that every student had their own dedicated device, with the assumption that providing a device meant all students could participate in remote learning. teachers rushed to transition their teaching plans to an online format; school administrators scrambled to schedule safe device pick-ups for students; and parents were not only juggling professional responsibilities and parenthood, but now teaching and tech support. although school boards provided tools to meet the “classroom” requirements, they could not ensure that every single student had access to a high-speed internet connection, nor could they offer school library access remotely. this is where the cpepl was able to offer support. the global shut down had a significant impact on the relationship that the cpepl had with the schools in our county. a large focus of mine was to rebuild those working relationships to support students, staff, and families, and ultimately demonstrate in actionable ways how the local pu blic library system was there for them. one immediate way i thought we could demonstrate support was through lending our wi-fi hotspots. hotspot lending programs through public libraries have gained popularity over the last few years. although our program had been in place for nearly 5 years, i am always surprised at the number of people that do not realize it is an available resource. with that in mind, i persistently reached out to the school administrators in our area and set up meetings to discuss how our borrow the internet program could benefit those working remotely without reliable internet. wait lists for our 9 available hotspot devices drastically increased, but mailto:jlane@peclibrary.org information technology and libraries june 2022 gathering strength to combat access inequality | lane 2 our patron community was incredibly supportive of our students and would frequently request that their loan, which is at maximum 7 days in length, be passed to a student. though connecting families with internet hotspots was helpful for the required online learning, we could not fill the gap completely. if we had an unlimited communications budget, the situation would have been easily remedied, but, as we all know in the library world, budgets can be very tight. this fact pushed us to find creative ways to bring as many resources as possible to the students, staff, and families in our community. to broaden the reach to individual schools (and staying persistent with that outreach), i focused on not only ensuring that school communities knew what physical resources the library had, but also what electronic resources were available. these conversations and emails with school administrators led me to get in contact with the curriculum coordinator at the board office. this connection was a complete game changer. instead of us, as a public entity outside of the school community, contacting individual schools and trying to build relationships with teachers, librarians, and administrators, we had the person who oversaw all of the school librarians, library technicians, and curriculum development for the k-8 grades on our side. the coordinator was on board to help us make the desired connections with the schools in a number of ways. she put us in contact with the curriculum coordinator for the secondary grades (9-12) and our program and service list was sent from the board office to every teacher, principal, school librarian, and library technician in prince edward county. we were then able to set up a meeting with the coordinator of assistive technologies for the board, which set us on a track to completely revamp how we marketed and allocated our resources to schools. it became clear in our first conversation that we needed to get students connected with their public libraries as quickly and efficiently as possible. with students split between in-person learning, virtual learning, or a combination of the two, with still minimal to no access to school library borrowing, the online resources of the public library system seemed like the perfect solution. not only would connecting students, staff, and their families with their local public library be a way to get everyone reading, but we were fulfilling the opportunity to ensure that everyone had genuine and equitable access. what the school board had observed was that the required shift to remote learning made the inequality of literature access glaringly obvious. students who relied on their school library for reading were not getting that opportunity and students who had individual education plans were jumping through hoops to get digital copies of material. so though everyone had a school supplied chromebook, not everyone had the same access. this is where public library subscriptions to hoopla and libby came to the rescue for providing current and popular literature in a variety of electronic formats for students to immediately access for both course reading and leisure enjoyment. connecting with like-minded, growthand education-oriented people is incredibly empowering. the curriculum coordinators at the board office were so enthusiastic about connecting students, staff, and families in our school board with their public library that it made the next parts of the process not only successful, but fun as well! the curriculum coordinators and i created a presentation that we brought first to school administrators in prince edward county. having public library advocacy come from the school board was incredibly influential and a big step toward issuing library cards to students. once we had buy-in from the school administrators, we circulated registration forms for families to fill out and get everyone in their household public library access. we found that the easiest way to do this information technology and libraries june 2022 gathering strength to combat access inequality | lane 3 was using google forms. it was simple for parents to fill out and easy for library staff to glean the required information for card registration. since the library was also working with the virtual school, we needed to be able to issue library cards even if some students were not in our catchment area. it was common for virtual classes to consist of students from the smallest village in pec and all the way up to the northern most part of hastings county, a full 3 hours’ drive away. cpepl was able to accommodate this need. pec is a tourist destination and frequently issues cards for visitors staying in the area for an extended period of time under the rule of if you “wo rk, live, or play” in pec, you are eligible for a public library card. once library cards were set up or renewed for all families who requested them through the google form, i got to work teaching students and staff how to access library resources. after communicating with the curriculum staff and public school administrators, it was decided that creating an information presentation on getting started with hoopla was the best course of action. hoopla is an incredibly intuitive application in regards to the format possibilities (ebooks and audiobooks) as well as adjustable features within each format. the available settings and adjustment options make the reading experience comfortable and accessible as possible for users. also, since there is no wait time to borrow materials, this allowed entire classes learning remotely to all check out the same title and read together. the material presented to students was easy to understand and interactive. the session provided ample time for students to follow along and test each feature in the hoopla app with their own individual book selections. the best part? this presentation was just the starting point. while we were only able to schedule and virtually deliver this presentation at two in-person schools, the other five schools in pec and a number of primary classes in the virtual school still participated in the google form for library card registration. teachers started asking what else the public library had to offer to enhance the curriculum delivery with additional resources. many community teachers were reminded of the public library’s services and resources (beyond just hoopla) and reached out for class visits or access to materials. other schools outside of our prince edward county catchment reached out and connected with their local public libraries, or vice versa. we are still working to develop ways to meet the needs of students, staff, and their families through the public library. some schools in the northern area of the region have students coming from multiple, different public library catchment areas, and most of these libraries do not have the same resources as others, especially in the case of smaller systems. this posed an issu e of equitable access for students: why should some students in the class have access to library online resources, and some not because they come from different/smaller communities? we were able to mitigate this issue with the virtual school, but for students attending in-person learning, we could not give library cards to every student in the school board. thankfully, another public library system in our area stepped up their access to offer virtual library access to any student or teacher in hastings county (so everywhere except prince edward county). this recognition of the importance of equitable access enabled students to not only regain access to a public library system, but it also ensured that all students could access books in the way that best suited them. when i ask a class if listening to an audiobook counts as reading, it amazes me that the majority of the class say “no.” or if i ask students if they had ever read an ebook, some would say it was not a “real” book. these comments and notions are not only untrue, but they are information technology and libraries june 2022 gathering strength to combat access inequality | lane 4 also exclusionary. countless students need other formats than just printed materials. how many would benefit from listening to an audiobook along with reading a printed version? how many students dislike reading because it is just hard to see the words, but if the text was more spaced out, or a different font, it would make all the difference? how many times is a student not able to access a book they want because all available copies are already checked out at their school library? these are issues students in the classes i work with face. having a public library card can significantly ease these barriers to access. all in all, we processed hundreds of card requests and renewals and were able to powerfully illustrate to teachers how they could meaningfully integrate public library resources into their classrooms, either virtually or physically. our requests for library visits came back up to prepandemic levels, but we were working with more schools than we had previously. teachers were, and still are, reaching out and asking if we can get extra copies of books, or if we can lead virtual novel studies. one of our more popular pieces of progress is the integration of our coding programs with other subjects. currently, i am running a ukulele program where students are writing group arrangements using binary code as the basis for composition. we have classes doing art projects with robotics and integrating math learning objectives. we have done virtual story time and connected the story to creating scratch programs. the possibilities are endless , and now that we once again have the interest from teachers, we are working with them to support their students and all the learning that comes with incorporating technology and maker-thinking into a classroom environment. the momentum has not let up, and we are beyond thrilled. our communities and local school board have embraced the reality that public libraries are more than just books. public libraries are a critical part of any community and have the power to be a meaningful component to education at all levels. having schools and all educational stakeholders using public library services not only broadens the reach of a public library, but also broadens our advocacy potential. we know there is still a long way to go in terms of genuine equitable access, especially when it comes to technology. internet connectivity and technology literacy are just the tip of the iceberg, but when organizations support each other to truly serve their community, collectively, that is how you make change. text analysis and visualization research on the hetu dangse during the qing dynasty of china article text analysis and visualization research on the hetu dangse during the qing dynasty of china zhiyu wang, jingyu wu, guang yu, and zhiping song information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13279 zhiyu wang (mikemike248@gmail.com) is phd candidate, school of management, harbin institute of technology and associate professor, school of history, liaoning university. jingyu wu (734665532@qq.com) is graduate student, school of history, liaoning university. guang yu (yug@hit.edu.cn) is professor, school of management, harbin institute of technology. zhiping song (1367123893@qq.com) is graduate student, school of history, liaoning university. © 2021. abstract in traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. in this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. this technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. in this study, we use the historical documents of the qing dynasty hetu dangse, preserved in the archives of liaoning province, as data analysis samples. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters in the world. through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and svm (support vector machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the shengjing area of the qing dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. through this, archivists can be guided practically in historical materials’ management and compilation. introduction china has a long history documented in numerous archives. at present, various local archive departments preserve large numbers of historical documents from different periods. owing to the development of china’s archive digitization, archive management departments at all levels have established digital archive abstracts, catalogs, and subject indexes of historical documents in their collections realizing online retrieval of historical archives. with in-depth research on chinese history, simple catalog retrieval cannot satisfy researchers’ demand for related knowledge in historical archives. owing to the limitations of the catalog retrieval system, complex catalog data still need to be read manually. however, it is difficult to view the overall picture of the recorded content and impossible to easily distinguish important information in historical materials; this leads to various difficulties, such as the compilation of historical materials for chinese historical researchers. thus, in this study, we aim to use text analysis and visualization methods in machine learning to conduct data mining analysis of historical document data. these methods will help us discover the logical relationships of historical records and their purposes, accomplish visual presentations of historical entities and knowledge discovered in historiography, improve knowledge representation and automatic classification of historical data, and provide valuable information for historical archive researchers. mailto:mikemike248@gmail.com mailto:734665532@qq.com mailto:yug@hit.edu.cn mailto:1367123893@qq.com information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 2 during the process of analyzing traditional manual methods for interpreting historical documents, we find the following phenomena: macro description, single angle, selective analysis, and one-way knowledge connection, among others. for example, the hetu dangse preserved in the liaoning archives contains a total of 1,149 volumes and 127,000 pages, making it difficult to fully grasp and understand the overall content of such documents. relying on manual reading and analysis of entire archives is an unrealistic task. therefore, this paper proposes using machine learning, natural language processing (nlp), and other technologies to address various problems from traditional manual reading. first, information from historical documents can be revealed from different angles, and this allows the content of the documents to be displayed more comprehensively and scientifically through visual charts. second, use of objective quantitative analysis methods, such as text analysis and nlp, prevents subjective interpretations of the same content. third, nlp and other technologies can solve the problem of calculating massive text training data sets while forming systematic knowledge that avoids the omission and one-sided understanding of knowledge in the historical archive. the application of machine learning in historical data analysis has attracted the attention of researchers in management, history, and computer science. tao used the latent dirichlet allocation (lda) topic modeling algorithm to analyze the themes of documents from 1700 to 1800 included in the german archives, providing a more three-dimensional interpretation and explanation of the spiritual world of germany during the eighteenth century.1 chinese scholars kaixu et al. proposed a method of automatic sentence punctuation based on conditional random fields in ancient chinese.2 this method was proved to better solve the problem of automatic punctuation processing compared with the single-layer conditional random field strategy in ancient chinese as tested on the two corpora of the analects and records of the grand historian. swiss and south african scholars stauffer, fischer, and riesen, and chinese scholars wu, wang, and ma used the kws technology and deep reinforcement learning to automatically recognize handwritten pictures in historical documents.3 solar and radovan used the national and university library of slovenia’s historical pictures and maps as research data. using gis technology, they created a novel display method, and interdisciplinary data resource web application to access and research the data.4 chinese scholars dong et al. and polish scholars kuna and kowalski used the webgis technology to conduct efficient management and visualization research on historical data of natural disasters in ancient china and russia. 5 meanwhile, latvian scholars ivanovs and varfolomeyev and dutch scholars schreiber et al. used web technology to develop a web service platform and explored the intelligent environment of cultural heritage service utilization.6 korean scholars kim et al. used machine learning technology to determine the complex relationships between tasks of various classes in a specific historical period through the network of historical figures.7 judging from results in related fields, the semantic analysis and visualization of historical archives in an intelligent way are gradually moving from statistical description to knowledge mining. these results provide theoretical feasibility and practical technical experience for this study. at present, research on historical documents mainly focuses on the retrieval and utilization of historical material databases. since the words, semantics, grammar, and sentence patterns recorded in historical materials differ from modern texts, using data mining technologies such as machine learning and nlp to intelligently identify historical documents and organize historical data will help us more than traditional methods. this requires the cooperation of artificial intelligence and historical researchers to establish an effective method of historical big data information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 3 analysis to achieve the transformation from traditional manual historical document analysis to automatic artificial intelligence analysis methods. in this paper, we use machine learning and data visualization as a tool to identify differently the content of the historical documents from traditional literature reading, reveal valuable information in the content of historical documents, and promote more systematic, efficient, and detailed understanding of the literature. related technology definition to perform text analysis and visualization of the hetu dangse, we use machine learning technology such as word vector processing, the svm (support vector machines) model and network analysis. word vector is a numerical vector representation of a word’s literal and implicit meaning.8 we segmented the hetu dangse’s catalog data and used the word2vec model to transform the segmented data’s word vector form into a set of 50-dimensional numerical vectors representing a catalog’s vector data set. to accurately visualize historical document records’ relationship features, we reduced the vector data set’s dimensionality. dimensionality reduction, or dimension reduction, is data’s transformation from a highinto a low-dimensional space so that the representation retains some of the original data’s meaningful properties, ideally close to its intrinsic dimension.9 after dimensionality reduction, each catalog data in the vector data set is reduced from 50 to 2 dimensions to facilitate flat display. we used the svm model and network analysis technology to analyze the vector data set. the svm model is a set of supervised learning methods used for classification, regression, and outlier detection.10 it is given a vector data set as training to represent historical document records as points in space, and learns independently through the kernel algorithm. using the algorithm, it maps the separated new records to the same space, and predicts their category based on which side of the interval they fall. network analysis techniques derive from network theory, a computer science system demonstrating social networks’ powerful influences. network analysis technology’s characteristics determine that it is suitable for books and historical archives’ visualization in the library and information science field, because the visualization technique involves mapping entities’ relationships based on the symmetry or asymmetry of their relative proximity.11 thus, it helps to discover historical documents’ knowledge relevance. for example, citation network analysis can identify emerging relationships in healthcare domain journals.12 sample data preprocessing and classification this study uses the catalog of the qing dynasty historical archives from the hetu dangse collected by the liaoning archives as the research sample to conduct text analysis and visualization research. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters both in domestic and international. the hetu dangse is the official document of communication between shengjing general yamen, the wubu of shengjing and fengtian office, and the document communicated between the beijing internal affairs office in charge and the liubu of beijing during the qing dynasty. the hetu dangse was published from 2015 to 2018, including the hetu dangse·kangxi period (56 volumes), hetu dangse·yongzheng period (30 volumes), hetu dangse·qianlong period (24 volumes), hetu dangse·qianlong period (17 volumes), hetu dangse·daoguang period (52 volumes), hetu dangse·jiaqing period (58 volumes), hetu dangse·qianlong period official documents (46 volumes), hetu dangse·qianlong period official documents (46 volumes), and hetu dangse·general list (16 volumes).13 the hetu dangse is an information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 4 important document for studying the history of the qing dynasty. owing to the special status of shengjing in the qing dynasty, it has a unique historical significance as the companion capital of beijing and the hometown of the qing royal family. this provides original evidence from this time for studying politics, economy, culture, history, and natural ecology in northeast china. in this study, we preprocess the catalog data of the hetu dangse by performing text segmentation, creating a corpus, and labeling data before using text analysis and visualization technology to analyze the catalog data of hetu dangse. first, we use word frequency analysis and statistics to study the functions of institutions. second, we use the co-word clustering algorithm to quantify and visualize the institutional relationships. finally, we use the svm model to automatically classify and explore the catalog data of the hetu dangse. figure 1 illustrates this process. figure 1. text analysis flowchart. data preparation and preprocessing we collected 95,680 catalog data items in the hetu dangse of the liaoning archives, including 25,148 items from the kangxi period; 1,096 items from the yongzheng period; 23,819 items from the qianlong period; 20,730 items from the jiaqing period; and 15,887 items from the daoguang period. the content of each catalog data includes three parts: title information, time of publication (chinese lunar calendar), and responsible agency. the proportion for each period was not evenly distributed in the catalog data of the hetu dangse with the kangxi period catalog data having the highest proportion (26.2%). through the catalog data information, we can perform an in -depth analysis of the content of the hetu dangse from the three perspectives: institutional functions, institutional relationships, and topic classification. data cleaning as the text recorded in the archives of the hetu dangse are manchu and ancient chinese, using chinese word segmentation tools (jieba, snownlp, thulac, etc.) based on modern chinese will cause errors. therefore, it is necessary to construct a special text corpus for word segmentation. first, we construct a stop vocabulary list to remove words with little impact on semantics in the hetu dangse, such as for (为), please (请) and of (之). second, we use the word segmentation tools mentioned above for preliminary word segmentation and then perform part-of-speech tagging and word segmentation corrections based on the word segmentation results. the title part of the catalog data of the hetu dangse mainly contains three dimensions of information: the record title of the catalog, issuing institution, and receiving institution. accordingly, we set a total of four types of tags in the text corpus: issuing institution, receiving institution, record type, and keywords. the receiving institution and the issuing institution correspond to the institutions at the beginning and the end of the catalog, respectively, such as the words shengjing zhangguan fang zuoling, and shengjing ministry of justice. the record type is the front word of the receiving institution, such as counseling (咨) and please (请). the keywords are words that can represent the overall semantics information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 5 in the record title of the catalog, such as arrest (缉拿) and advance (进送). table 1 presents the corpus we developed. table 1. hetu dangse corpus num word property1 property 2 1 盛京掌关防佐领 organization noun 2 为 stop_words preposition 3 缉拿 keywords verb 4 逃人 keywords noun 5 舒廷 name noun 6 官事 stop_words noun 7 咨 keywords verb 8 盛京刑部 organization noun 9 正白旗佐领 organization noun 10 兆麟 name noun 11 呈 stop_words preposition 12 为 stop_words preposition 13 交纳 keywords verb 14 壮丁 keywords noun 15 银两事 keywords noun ┋ ┋ ┋ ┋ 61047 收讫事 keywords noun 61048 盛京佐领 organization noun label data to improve the utilization efficiency of the hetu dangse and show the document content information from multiple angles, we use a supervised machine learning method to automatically classify the catalog data of the hetu dangse. therefore, the original catalog data set must be labeled. we determine the classification and label of the hetu dangse catalog according to the chinese archives classification law, chapter 12. table 2 presents the 11 categories of the catalog. with this, we complete the hetu dangse catalog sampling classification and labeling laying the foundation for automatic catalog classification. the hetu dangse has a total of 95,680 catalog records involving five periods: kangxi, yongzheng, qianlong, jiaqing, and daoguang. we randomly select 500 records from each period and manually label these 2,500 records as the sample data set. the data classification after manual labeling is shown in figure 2. the overall distribution is relatively even, making it suitable for machine learning processing. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 6 table 2. data labels num category 1 type of official documents （政务种类） 2 palace, royal family and eight banners affairs（宫廷、皇族及八旗事务） 3 bureaucracy, officials（职官、吏役） 4 military（军事） 5 politics and law（政法） 6 sino-foreign relations（中外关系） 7 culture, education, health and scientific cultural study（文化、教育、卫生及科学文化研究） 8 finance（财政） 9 agriculture, water conservancy, animal husbandry （农业、水利、畜牧业） 10 building（建筑） 11 transportation, post and telecommunication（交通、邮电） figure 2. percentage of the hetu dangse catalog data label chart. results in this study, we used the catalog data of the hetu dangse as a sample to analyze and reveal the hetu dangse catalog data from three perspectives: institutional function, institutional relationship, and automatic classification. this will improve usage efficiency of the hetu dangse, thus improving information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 7 researchers’ mastery of relevant information about the document. to achieve the functional requirements of text analysis, we adopted four methods: word vector conversion, word frequency analysis, co-word clustering, and the svm model. word vector conversion of text catalog data the automatic classification of machine-learning technology is based on vector data sets. thus, the hetu dangse text catalog data set must be vectorized before automatic classification. currently, word vector conversion technology mainly includes methods such as one-hot, word2vec, and glove. hetu dangse records the history of the qing dynasty for more than 200 years. there are inevitable relationships among the contents recorded in the documents, indicating that they are not isolated from each other. the word2vec model provides an efficient implementation of cbow and skip-gram architectures for computing vector representations of words, both of which are simple neural network models with one hidden layer. the word2vec model produces word vectors as outputs from inputting the text corpus. this method generates a vocabulary from the input words and then learns the word vectors via backpropagation and stochastic gradient descent.14 this makes the word2vec model more suitable for catalog data from hetu dangse. word2vec includes the cbow model and the skip-gram model, which can enrich the semantic relevance depending on the context, and it is more suitable for the semantic relevance of historical documents such as the hetu dangse. therefore, we adopt the skip-gram model to analyze the catalog data of hetu dangse. we extracted the features of word vectors in catalog data from the corpus, input them into the word2vec model, imported the gensim library in python, trained the vector embeddings, and obtained the htd.model.bin vector file and htd.text.model model file. the correlation between each word in the hetu dangse catalog can be found by implementing the model. for example, if the word bannerman (旗人) is input into the model, the most relevant words are minren (民人, with 0.84726 relevance), accused (被控, with 0.812017), and robbery (抢劫, with 0.795359). to visualize the ethnic relationships recorded in the hetu dangse catalog, we input the first 300 words of the word vector into the trained word2vec model and performed dimensionality reduction to realize a planar graph. to understand the structure of the data intuitively, we used the t-sne algorithm to reduce the dimensions of the word vector. the t-sne is a type of nonlinear dimensionality reduction used to ensure that similar data points in high-dimensional space are as close as possible in low-dimensional space. we set the embedded space dimension parameter of tsne to 2 and the initialization parameter as pca. this makes it more globally stable than random initialization. the maximum number of optimization iterations is 5,000. figure 3 presents the results. in figure 3, the terms sanling, yongling, zhaoling, prime minister, and fuling form clusters. in shengjing, the qing set up the sanling prime minister's office, and the prime minister's mausoleum affairs minister was appointed concurrently by general shengjing. near fujinmen, the sanling prime minister's office was established. in the 30th year of guangxu, the government office was changed to the prime minister's office of shengjing mausoleum affairs, and the governor of the three provinces concurrently served. under the sanling prime minister’s office, the sanling office was set up to undertake the sacrifice and repair affairs of the three tombs (xinbin yongling, shenyang fuling, and zhaoling).15 therefore, the clustering in figure 3 verifies the close relationship between the sanling prime minister's office and the tombs. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 8 figure 3. 2d tsne visualization of word2vec vectors. analysis of the relationship between the documents received and sent of the institution with the statistics of the text data obtained after word segmentation, we can find the quantitative relationship between the documents received and sent by the institution, using the pearson correlation coefficient to judge whether there is a correlation between the number of documents received and the number of documents sent by the same institution. 𝜌(𝑟,𝑠) = 𝑐𝑜𝑣(𝑅,𝑆) 𝜎𝑟 𝜎𝑠 (3.1) we suppose that the pearson correlation coefficient between the number of documents received and the number of documents sent is ρ(r,s), r= {r1, r2, r3...r11}. here, r is the variable set of documents received from the institutional sample. set s= {s1, s2, s3…s11} is the variable set of documents sent by the institutional sample. by dividing the covariance of r and s by the product of their respective standard deviations, we can obtain the value of the correlation coefficient of the documents sent and received by the same institution. mining the relationship between institutions’ sending and receiving documents based on co-word clustering to mine the relationship between the institutions’ sending and receiving documents, we adopt a co-word clustering algorithm to generate a visualized network map of institutional relationships. the global co-occurrence rate represents the probability of two words appearing together in all the data sets. in large-scale data sets, if two words often appear together in the text, these two words are considered to be strongly related to the semantics.16 clustering is a method that places objects into a group by similarity or dissimilarity. thus, keywords with high correlation to each other tend to be placed in the same cluster. social network analysis, which evaluates the unique structure of interrelationships among individuals, has been extensively used in social science, psychological science, management science, and scientometrics. 17 we can obtain a sociogram from the institutional function analysis. the main purpose of the sociogram is to provide information information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 9 about the relationship between institutions’ sending and receiving documents. in the sociogram, each member of a network is described by a “vertex” or “node.” vertices represent high-frequency words, and the sizes of the nodes indicate the occurrence frequency. the smaller the size of a node, the lower the occurrence frequency. lines depict the relationships between two institutions. they exist between two keywords, indicating that they received or sent documents to each other. the thickness is proportional to the correlation between the keywords. the thicker the line between the two keywords, the stronger the connection. using this rationale, the map visualization and network characteristics (centrality, density, core-periphery structure, strategic diagram, and network chart) were obtained by analyzing pearson’s correlation matrix or other similarity matrices.18 in this study, we conducted network analysis on a binary matrix to display the relationships between the documents sent and received by the institutions in the shengjing area during the qing dynasty recorded in the hetu dangse. further, we extracted the receiving institution and issuing institution from each record of catalog data in the hetu dangse, and then we composed a new data set with the following data from the receiving institution: issuing institution and title content. we used python to convert the new data set to endnote format and import it into vosviewer1.6.15 to calculate and draw a visual map of the new data set. van eck and waltman of the netherlands’ leiden university developed vosviewer, a metrological analysis software used for constructing and visualizing network graphs.19 although the software’s development principle is based on documents’ co-citation principles, it can be applied to the construction of data network knowledge graphs in various fields. combined with the co -word clustering algorithm, we can create an entity connection network map for historical documents through vosviewer software to reflect the recorded content. automatic classification method of historical archives catalog based on the svm model we used the svm model in machine learning for automatic classification. the svm model has the advantages of strong generalization, low error rate, strong learning ability, and support for small sample data sets, making it suitable for historical archive catalog data samples with small sample characteristics. therefore, we attempted to classify the catalog data set of hetu dangse using the svm model. first, we divided the vectorized labeled data set into a training set and a testing set. the training set accounts for 70% of the data, and the testing set accounts for 30%. to ensure the accuracy of the model prediction, we adopted a random division method to avoid overfitting. second, we used a linear kernel in the svm model and grid search to find the best parameter. various combinations of the penalty coefficient (c) and gamma parameter in the svm model were tested based on their accuracy ranked from high to low. we then determined the best parameter combination. after the model was established, we validated the predictive performance of the model from multiple perspectives such as precision, recall, and f1 score to ensure the generalization ability and availability of the model. we set the penalty coefficients to 10, 100, 200, and 300, while the gamma parameters are set to 0.1, 0.25, 0.5, and 0.75. we used the precision evaluation criteria to find the optimal parameter combination of the model and then imported them. the penalty coefficient is set to the x-axis, the gamma parameter set to the y-axis, and the precision set to the z-axis. we implemented the model to obtain the visualization that is shown in figure 4. clearly, the optimal parameter combination is a penalty coefficient of 10 and a gamma parameter of 0.075. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 10 figure 4. svm grid search parameter tuning diagram. discussion the history of a nation is the foundation on which it is built. historical documents are the witnesses and recorders of history. through the study of historical documents, we can go back to the past, cherish the present, and look forward to the future. an increasing number of scholars have studied these documents in recent years due to their importance. the hetu dangse records the document communications between institutions in shengjing (now shenyang) and beijing during the qing dynasty. it is an important historical document that cannot be ignored when information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 11 studying the history of northeast china during the qing dynasty. here, we use the catalog data of the hetu dangse as the sample data to test the machine learning methods previously mentioned. we explore the results from the perspectives of institutional function, institutional relationship, and automatic classification to determine the feasibility of our methods. functions of institutions the number of institutions involved in the hetu dangse is over 150. these functional departments formed the governance system of the shengjing area during the qing dynasty. to gain a deeper understanding of the qing dynasty’s ruling system in the shengjing area, the functions of these institutions should be examined. this study analyzes and studies the functions of the institutions in the shengjing area through the number of documents and the frequency of content of the sending and receiving institutions. analysis of the number of documents received and sent by institutions by sorting and statistically analyzing the catalog data of hetu dangse, we obtained data on the number of documents received and sent by institutions in the shengjing area recorded in the hetu dangse. we set the vertical axis as the total number of communicated documents, number of issued documents, and number of received documents. we set the horizontal axis as the names of the institutions and then drew a histogram. this study analyzes the number of institutional archives of the hetu dangse catalog from three perspectives: total number of sent and received documents, number of received documents, and number of issued documents to find the institutions with the highest research value in the shengjing area. in the histogram shown in figure 5(a), the top three institutions in total number of communicated documents are shengjing internal affairs office, shengjing zuoling, and shengjing ministry of revenue. we can also observe that the top 10 institutions have different volumes of their respective documents received and sent by institutions. therefore, the ranking of the total number of communicated documents is not directly related to the respective rankings of the number of documents received and the number of documents sent. in figure 5(b), we can observe that the top three institutions in number of documents received in the hetu dangse are shengjing internal affairs office, shengjing ministry of revenue, and shengjing general yamen. figure 5(c) shows the top three institutions in number of documents sent in the hetu dangse are shengjing internal affairs office, shengjing zuoling, and shengjing general yamen. the total number of communicated documents, number of documents sent, and number of documents received by the shengjing internal affairs office all rank first; this indicates that the shengjing internal affairs office is the most important department of the ruling system in the qing dynasty during the shengjing area. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 12 figure 5. number of documents received and sent by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 13 by using the number of documents received and sent by the institutions, we calculated the pearson correlation coefficient to determine if the number of documents received and sent by the same institution is relevant. as institutional samples, we selected the shengjing internal affairs office, shengjing ministry of revenue, (beijing) internal affairs office in charge, shengjing zuoling, shengjing ministry of works, shengjing ministry of justice, shengjing general yamen, shengjing close defense zuoling, shengjing ministry of war, fengtian general yamen, and shengjing ministry of rites. through calculation, the result of pearson correlation coefficient is 0.69 (save two decimal places), so there is a correlation between the number of sent and received documents, as shown in figure 6. figure 6. scatter plot of pearson correlation coefficient. the hetu dangse is a copy of official documents dealing with the royal affairs of the shengjing internal affairs office during the qing dynasty. it contains the official documents between the shengjing internal affairs office and the beijing internal affairs office in charge, the liubu, etc. and the local shengjing general yamen, fengtian office, the wubu of shengjing, and other yamens.16 thus, there exist a large stock of documents with the shengjing internal affairs office as the sending and receiving agency. the wubu of shengjing, shengjing general yamen, shengjing zuoling, and other institutions are important hubs for the operation of institutions in shengjing. they played an important role in maintaining and stabilizing the society of shengjing. the number of documents is second in importance only to the shengjing internal affairs office. analysis of the frequency of documents received and sent by institutions to further explore the functions of institutions with research value, we extracted the contents of the catalogs from the top three institutions in total number of documents sent and received: shengjing internal affairs office, shengjing ministry of revenue, and shengjing zuoling. we then classified the catalogs of the aforementioned institutions according to receipts and postings. subsequently, we used word segmentation and word frequency statistics to process the two types information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 14 of catalog information and draw comparison diagrams to explore their specific functions in the hetu dangse. as shown in figure 7, we can roughly divide the obtained segmentation words into two categories. one is the name of the communicated official document institutions, such as the ministry of revenue, the ministry of justice, and the ministry of rites on the side of the word frequency (see fig. 7[a]). the other is the name of the official document content and the words zhuangtou (庄头), dimu (地亩), and zhuangding (壮丁) on the side of the frequency of the words in the documents sent. through a comparative analysis of the top 10 words received and sent by the same institution, we conclude that the institutions with a close relationship between receiving and sending documents are not the same. for example, the ministry of revenue of shengjing internal affairs office ranks first in the frequency of documents sent by institutions, while the shengjing zuoling ranks first for receiving institutions (see fig. 7[b]). the contents of documents sent and received by the same institution are different. figure 7(c) shows how the affairs sent by shengjing zuoling to ula (乌拉), forage (粮草), and license (执照) differ from those represented by the zhuangtou (庄头), accounting (会计), and close defense (关防) in the frequency of documents sent and frequency of receipts, respectively. based on previous research on the functions of shengjing’s institutions, the shengjing internal affairs office was set up in the companion capital of shengjing during the qing dynasty to be in charge of shengjing cemetery, sacrifice, organization of staff transfer, and other matters. 20 this relates to the meaning of words such as sacrifice (祭祀) in figure 7(a). the functions of the shengjing ministry of revenue were represented in guangxu’s great qing huidian. the cashiers in charge of taxation in shengjing, number of annual losses in official villages, and banner land were carefully recorded. the expenditures were distinguished and the accounting obeyed the regulations according to the beijing ministry of revenue at the end of the year.21 this is related to the meaning of words, such as dimu (地亩), land sale (卖地), and money and grain (钱粮) in figure 7(b). in fu yonggong and guan jialu’s research of shengjing zuoling’s functions, shengjing zuoling handled the transfer communicated documents; supervised and urged the various departments of guangchu, duyu, zhangyi, accounting, construction, and qingfeng to undertake matters; managed officials and various people; maintained the shengjing palace and the warehouse; selected women to send to beijing inspect; heard all types of cases; undertook the emperor’s general letter; managed the ula people and tributes; and accepted the emperor or the internal affairs office in charge, among other tasks.22 this is connected to the meaning of words such as ula (乌拉), close defense (关防) and license (执照) in figure 7(c). information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 15 figure 7. word frequency comparison of documents received (in blue) and sent (in orange) by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 16 institutional relationship analysis to further study the governance structure of the shengjing area, we not only need to understand the functions of each institution but also explore the overlap between functions of institutions. the catalog data of the hetu dangse consist of three parts: receiving institutions, issuing institutions, and record title of the catalog. a document often includes two institutions, th e receiving institution and the issuing institution, and it is certain that the content of a document relates closely to the functions between the two institutions. by observing the closeness between the number of institutions through visualizations, we conducted a quantitative analysis of consistent catalog data of the receiving and issuing institutions in the hetu dangse to provide reliable data for further research in the intersection of institutional functions in shengjing area. results of institutional connection analysis using the co-word clustering algorithm, we counted the number of archive catalog data consistent with the receiving and issuing institutions. we set the vertical axis as the issuing institution and the horizontal axis as the receiving institution to obtain figure 8. the numbers inside the boxes represent the quantity of catalog data that are consistent with the issuing institution. to facilitate measurements in the statistical process, records less than or equal to 50 communicated documents between the receiving institution and the issuing institution have been zeroed out. as shown in figure 8, the institutions having close relations with the documents recorded in the hetu dangse are concentrated in the issuing institutions shengjing zuoling and shengjing internal affairs office, and the receiving institutions shengjing internal affairs office and shengjing zuoling. among the receiving institutions, the number of documents received by the shengjing internal affairs office from shengjing general yamen reached as high as 11,936. the top three documents received by shengjing zuoling were fengtian general yamen (2,265 pieces), shengjing ministry of revenue (1,527 pieces), and shengjing ministry of justice (1,520 pieces). it is worth noting that there are less than 50 documents from shengjing zuoling in the shengjing internal affairs office. the overlapping functions of the institutions in the shengjing area enabled individual offices to play bureaucratic games, passing responsibility to other offices, leading to low efficiency in handling affairs. for example, the military and political power in the shengjing area was jointly controlled by the shengjing general office and the shengjing ministry of war. the shengjing area’s tax power was controlled by the shengjing ministry of revenue and fengtian office and their subordinate offices. this phenomenon ran through the entire qing dynasty. research on the cr ossfunctionality of institutions has always been a hot topic in qing historiography. by analyzing the official documents between the institutional functions, we can further explore the overlap as well as the advantages and disadvantages of the qing dynasty shengjing ruling system to study the history of shengjing institutions in the qing dynasty more thoroughly providing a reference for the design of current institutions. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 17 figure 8. relationship of communicated documents by the hetu dangse institutions diagram. visualization of institutional network map we used the hetu dangse catalog as sample data and the co-word clustering algorithm to obtain the close relationship between institutions and the appearance frequency of institutions. we drew a visual network diagram by virtue of vosviewer1.6.15 to obtain figure 9. in figure 9, institutions are represented by default as a circle with their names. the size of the label and the circle of an institution are determined by the weight of the item. the higher the weight of an item, the larger the label and the circle of the item. for some items, labels may not be displayed to avoid overlapping labels. the color of an institution is determined by the cluster the institutions belong to, and lines between items represent links. as shown in figure 9, the relationships between the institutions and departments in the hetu dangse form three core groups: the shengjing internal affairs office (in charge), shengjing zuoling, and beijing internal affairs office in charge. however, the relationships between the three groups are not similar; the distance between the group (beijing) internal affairs office in charge and the two other groups is relatively large. the group at the core of shengjing internal affairs office and the group at the core of shengjing zuoling are closely connected to each other through the wubu of information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 18 shengjing (shengjing ministry of revenue, shengjing ministry of rites, shengjing ministry of war, shengjing ministry of justice, and shengjing ministry of works). further, there are two larger individuals: fengtian general yamen and shengjing general yamen. fengtian general yamen and shengjing zuoling are closely related to each other, and the relationship between shengjing general yamen and shengjing internal affairs office is relatively close. figure 9. co-occurrence of institutions network map. the city of shengjing was the companion capital of the qing dynasty. the qing government implemented special governance measures in these areas that differed greatly from those of direct inland provinces.23 to ensure the stable rule of the shengjing area, the qing dynasty performed the following tasks. first, the qing dynasty set up a general garrison as the highest military and political chief in the shengjing area to be responsible for all military and political affairs within its jurisdiction. second, they established the fengtian office, a capital of the same level as the shuntian office, to rule the common people of the shengjing area. the states and counties, as well as the garrison banner officer, which was under the rule of general garrison, were local administrative institutions under the fengtian office. these institutions implemented the dual management rule of the bannerman and common people. third, as the companion capital, the shengjing area followed the ming dynasty companion capital system to set up the wubu of shengjing to maintain power. in addition, the shengjing internal affairs office, which was in charge of palace affairs, communicated with the beijing internal affairs office in charge. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 19 results of automatic classification analysis catalogs are important information resources in the field of historical archives. the classification of archival catalogs can not only link relevant information in archives or archive fonds, improve researchers’ utilization efficiency, and save time to search for required archives, but it can also be shown to readers in clusters. as the hetu dangse catalog is a series of historical documents stored for a long period of time, its original classification system does not suit well existing archival management methods. the hetu dangse has a total of 1,149 volumes and 127,000 pages. each volume contains a different number of documents and the ink characters on chinese art paper are in manchu and chinese. reading and categorizing the full text of the hetu dangse not only requires a lot of manpower, material, and financial resources but also extremely high requirements for the classified staff. they need to possess a good knowledge of manchu, archival science, document taxonomy, and other related disciplines. therefore, sorting and organizing the content of the hetu dangse is an impractical task that relies on manual reading and comprehension. to address this problem, we used the svm model of machine learning to automatically classify and explore the catalog data of the hetu dangse. this model further demonstrates the relevance of the knowledge between documents in the hetu dangse and facilitates an in-depth analysis. we imported the vectorized labeled data set into the svm model and selected the optimal parameter combination to run the model. to visualize the data results, the 50-dimensional word vector is reduced to a 2-dimensional word vector using the t-distributed random neighborhood embedding algorithm. we used the svm model to establish a hyperplane visualized in 2dimensional form. the legend only in figure 10 shows the data distribution of the six categories with the highest proportion owing to the large number of categorized data. to test the classification effect of the svm model, we used precision and recall as metrics and calculated the f1 score to validate the model. the results are presented in table 3. based on the created svm model, 95,680 catalog data of the hetu dangse were predicted and classified. the results are shown in figure 11. although there exist certain deficiencies in accuracy and other aspects, it a positive impact for the content research, management, utilization, and retrieval discovery of hetu dangse. table 3. svm model validation parameters result precision 0.736 recall 0.717 f1 score 0.716 information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 20 figure 10. svm decision region boundary. figure 11. hetu dangse catalog data prediction classification. conclusion in this study, we used machine learning to analyze and visualize the catalog data of the hetu dangse, revealing the functional relationship of the qing dynasty, shengjing regional institutions recorded in this historical document, and showing the institutional communicated relationships. using the svm model, we achieved automatic classification of the hetu dangse catalog from the category perspective. owing to the massive archives of historical materials in ancient china, the information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 21 fonts of many historical materials cannot be recognized by computers or humans. the digitization of catalogs has become a digital bridge between researchers and historical documents. this not only achieves the concise summary and refinement of them but also greatly improves the utilization efficiency by researchers. the svm model can “learn” through the labeled sample data and realize automatic classification of large amounts of unlabeled catalog data. by automatic classification of catalog data, historical data researchers and archive managers can use and manage a large number of historical documents and catalog data more effectively, greatly increasing their utilization. the co-occurrence algorithm can reveal the rules written by the catalog data itself, discover the distance between the catalog data, and form clusters providing a clearer direction for researchers to use historical documents. the algorithm also saves time for researchers to identify documents without purpose, making content presentation of historical documents to readers clearer. this paper improves archivists’ awareness of archive data compilation and management. first, data is observed, topics are identified, and potential relationships between these are found and established to improve historical archives’ compilation. second, the visual presentation method and carrier is chosen, and via the web browser established relationships are visualized for the users to access and utilize. it can be said that scientometric research method can promote the transformation of historical research and archives management and compilation research from traditional explanatory scholarship to truth-seeking scholarship. currently, the application of machine learning technology has gradually extended from applied disciplines to traditional fields of literature, art, and sociology. however, there are still many opportunities in the field of historical research. this study used methods in the field of artificial intelligence to conduct text mining and visualize the presentation of historical archive document catalog data and proposes a new digital and intelligent solution for researching chinese historical documents. with the development of science and technology, research methods for historical documents are undergoing constant changes from the traditional manual subjective analysis of historical data to relying on quantitative analysis represented by deep learning and data mining technology. it is an irreversible trend to research historical documents more comprehensively, accurately, and scientifically by means of artificial intelligence and other technologies on the scientific frontier. for future work, we plan to conduct research on the qing dynasty historical documents from a deeper semantic analysis level, construct a knowledge graph through the method of named entity recognition, and construct an ontological model transforming historical documents into a structured knowledge base to discover new knowledge from historical documents in an automated manner. acknowledgments funding statement this work was supported by the general program of the national natural science foundation of china [grant number 72074060], the research foundation of the ministry of education of china [grant number 20jhq012], and the national social science fund of china [grant number 16btq089]. data accessibility the data sets supporting this article have been uploaded as part of the supplementary material. https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 22 competing interests we have no competing interests. endnotes 1 wang tao, “data mining of german historical documents in the 18th century, taking topic models as examples,” xuehai 1, no. 20 (2017): 206–16, https://doi.org/10.16091/j.cnki.cn321308/c.2017.01.021. 2 kaixu zhang and yunqing xia, “crf-based approach to sentence segmentation and punctuation for ancient chinese prose,” journal of tsinghua university (science and technology) 10, no. 27 (2009): 39–49, https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027. 3 michael stauffer, andreas fischer, and kaspar riesen, “keyword spotting in historical handwritten documents based on graph matching,” pattern recognition 81 (2018): 240–53, https://doi.org/10.1016/j.patcog.2018.04.001; wu sihang et al., “precise detection of chinese characters in historical documents with deep reinforcement learning,” pattern recognition 107 (2020): 107503, https://doi.org/10.1016/j.patcog.2020.107503. 4 renata solar and dalibor radovan, “use of gis for presentation of the map and pictorial collection of the national and university library of slovenia,” information technology and libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 5 shaochun dong et al., “semantic enhanced webgis approach to visualize chinese historical natural hazards,” journal of cultural heritage 14, no. 3 (2013): 181–89, https://doi.org/10.1016/j.culher.2012.06.009; jakub kuna and łukasz kowalski, “exploring a non-existent city via historical gis system by the example of the jewish district ‘podzamcze’ in lublin (poland),” journal of cultural heritage 46 (2020): 328–34, https://doi.org/10.1016/j.culher.2020.07.010. 6 aleksandrs ivanovs and aleksey varfolomeyev, “service-oriented architecture of intelligent environment for historical records studies,” procedia computer science 104 (2017): 57–64, http://doi.org/10.1016/j.procs.2017.01.062; guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” journal of web semantics 6, no. 4 (2008): 243–49, https://doi.org/10.1016/j.websem.2008.08.001. 7 m kim et al., “inference on historical factions based on multi-layered network of historical figures,” expert systems with applications 161 (2020): 113703, http://doi.org/10.1016/j.eswa.2020.113703. 8 hobson lane, cole howard, hannes hapke, natural language processing in action: understanding, analyzing, and generating text with python (new york: manning publications, 2019), 165. 9 laurens van der maaten, eric postma, and jaap van den herik, “dimensionality reduction: a comparative review,” tilburg university technical report, ticc-tr 2009-005 (2009), https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_200 9.pdf. https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027 https://doi.org/ https://doi.org/10.1016/j.patcog.2018.04.001 https://doi.org/10.1016/j.patcog.2020.107503 https://doi.org/10.6017/ital.v24i4.3385 https://doi.org/10.1016/j.culher.2012.06.009 https://doi.org/10.1016/j.culher.2020.07.010 http://doi.org/10.1016/j.procs.2017.01.062 https://doi.org/10.1016/j.websem.2008.08.001 http://doi.org/ https://doi.org/10.1016/j.eswa.2020.113703 https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 23 10 gavin hackeling, mastering machine learning with scikit-learn (birmingham: packt publishing, 2017). 11 richard smiraglia, domain analysis for knowledge organization: tools for ontology extraction (oxford: chandos publishing, 2015). 12 kuo-chung chu, hsin-ke lu, and wen-i liu, “identifying emerging relationship in healthcare domain journals via citation network analysis,” information technology and libraries 37, no. 1 (2018): 39–51, https://doi.org/10.6017/ital.v37i1.9595. 13 archives of liaoning province in china, “the hetu dangse series archives publication,” qing history research 6, no. 2 (2009): 1. 14 amit kumar sharma, sandeep chaurasia, and devesh kumar srivastava, “sentimental short sentences classification by using cnn deep learning model with fine tuned word2vec,” procedia computer science 167 (2020): 1139–47, https://doi.org/10.1016/j.procs.2020.03.416. 15 b hongxi, “research on the sanling management institutions of the qing dynasty outside the pass,” manchu minority research 4, no. 12 (1997): 38–56. 16 guangli zhu et al., “building multi-subtopic bi-level network for micro-blog hot topic based on feature co-occurrence and semantic community division,” journal of network and computer applications 170 (2020): 102815, https://doi.org/10.1016/j.jnca.2020.102815. 17 s. ravikumar, ashutosh agrahari, and s. n. singh, “mapping the intellectual structure of scientometrics: a co-word analysis of the journal scientometrics (2005–2010),” scientometrics 102 (2015): 929–55, https://doi.org/10.1007/s11192-014-1402-8. 18 jiming hu and yin zhang, “research patterns and trends of recommendation system in china using co-word analysis,” information processing and management 51, no. 4 (2015): 329–39, https://doi.org/10.1016/j.ipm.2015.02.002. 19 nees jan van eck and ludo waltman, “software survey: vosviewer, a computer program for bibliometric mapping, scientometrics, 84, no. 2 (2010): 523–38, https://doi.org/10.1007/s11192-009-0146-3. 20 z yanchang and l xinzhu, “the study of the function of shengjing office from the use of the official communication — an academic investigation based on hetu dangse,” shanxi archives 8, no. 12 (2020): 179–88. 21 shengjing ministry of revenue, guangxu's great qing huidian volume 25 (zhonghua book company, 1991), 211–12. 22 f yonggong and g jialu, “brief introduction of shengjing upper three banners baoyi zuoling,” historical archives 9, no. 30 (1992): 93–7. 23 wangyue, “research on the yamens and their affair relationships in shengjing area,” shenyang palace museum journal 1, no. 31 (2011): 67–77. https://doi.org/10.6017/ital.v37i1.9595 https://doi.org/10.1016/j.procs.2020.03.416 https://doi.org/10.1016/j.jnca.2020.102815 https://doi.org/ https://doi.org/10.1007/s11192-014-1402-8 https://doi.org/10.1016/j.ipm.2015.02.002 https://doi.org/10.1007/s11192-009-0146-3 abstract introduction related technology definition sample data preprocessing and classification data preparation and preprocessing data cleaning label data results word vector conversion of text catalog data analysis of the relationship between the documents received and sent of the institution mining the relationship between institutions’ sending and receiving documents based on co-word clustering automatic classification method of historical archives catalog based on the svm model discussion functions of institutions analysis of the number of documents received and sent by institutions analysis of the frequency of documents received and sent by institutions institutional relationship analysis results of institutional connection analysis visualization of institutional network map results of automatic classification analysis conclusion acknowledgments funding statement data accessibility competing interests endnotes factors affecting university library website design | kim 99 yong-mi kim factors affecting university library website design factors include usability testing and institutional forces.5 because website design studies are sparse, this study examines the success of technology utilization studies to further identify factors that are pertinent to website design in order to provide a comprehensive view of web design success factors. a review of literature related to university library website design will be offered in the next section. the research methods, which discuss the data collection strategies and the measurements used in the current study, will be followed by the literature review. the findings of the study will later be reported and discussed after the research methods section. the paper will then conclude with an overview of the implications the findings have for academia and managers. ■■ literature review this section offers an overview of the existing website design literature and relevant success factors. these factors include institutional forces, supervisors’ technical knowledge and support, input from secondary sources, and input from users. because the aforementioned elements are identified as independent variables, this study also adopts them as such. following existing studies, website success factors are identified from the utilitarian perspective.6 the dependent variables are (1) the extent to which website designers meet users’ needs, (2) the extent to which users perceive ulwr to be useful, and (3) their actual usage. in this manner, the evaluation of success is measured from different perspectives. this discussion of the independent and the dependent variables appears in the conceptual model, figure 1. institutional forces institutional forces refer to as organizations following other organizations practices to secure efficiency and legitimacy. existing studies have identified three institutional forces: coercive, mimetic, and normative.7 coercive force takes place when an organization pressures others to adopt a certain practice. it is higher when an organization is a subset of another organization. in this research context, the university could be an agent of coercive force. mimetic force refers to organizations following other organizations’ practices, and it is especially common for organizations within the same industry group.8 because organizations within existing studies have extensively explored factors that affect users’ intentions to use university library website resources (ulwr); yet little attention has been given to factors affecting university library website design. this paper investigates factors that affect university library website design and assesses the success of the university library website from both designers’ and users’ perspectives. the findings show that when planning a website, university web designers consider university guidelines, review other websites, and consult with experts and other divisions within the library; however, resources and training for the design process are lacking. while website designers assess their websites as highly successful, user evaluations are somewhat lower. accordingly, use is low, and users rely heavily on commercial websites. suggestions for enhancing the usage of ulwr are provided. f rom a utilitarian perspective, a website evaluation is based on users’ assessments of the website’s instrumental benefits.1 if a website helps users complete their tasks, they are likely to use the website. following this line of reasoning, dominant research has reported that users are most likely to use university library website resources (ulwr) when they can help with user tasks.2 although we know now that the utilitarian perspective should be applied to web design, not clear is the extent to which web designers consider users’ needs and, likewise, the extent to which users consider ulwr to be successful in terms of meeting their needs. also not clear are what factors other than user needs influence university library website design. this is not a trivial issue because university libraries have invested a massive number of resources into providing web services and need to justify their investments to stakeholders (such as the university) by demonstrating their ability to meet users’ needs.3 also important is the identification of these factors because web design and website performance are closely correlated.4 as a consequence, investigating factors that influence successful university library website design and providing managerial guidance is a timely pursuit. later, the objectives of this paper are twofold: 1. what factors influence university library website design? 2. to what extent do website designers and users consider the university library website to be successful? to explore these research questions, this study identifies factors influencing university library website design that have been reported in existing literature. these yong-mi kim (yongmi@ou.edu) is assistant professor, school of library and information studies, university of oklahoma, tulsa, oklahoma. 100 information technology and libraries | september 2011 although it is a critical factor for website success, there is little evidence that website designers receive strong support from their supervisors. research shows that supervisors’ lack of knowledge about websites inhibits user-centered website design.17 a respondent from chen et al.’s study reports, “it’s really a pain trying to connect with our administration on the topic of web design and usability, because even definitions are completely out the window” and “the dean and the associate directors know little about the need for usability and view it as a last minute check-off, so they can say that the web site is tested and usable.”18 lack of supervisor support inhibits website usability.19 input from secondary sources website designers typically aggregate information from secondary sources rather than from users. identified secondary sources are consultations with experts, other divisions within the library, webmasters, web committees, and focus groups.20 the most widely used method is consultation with experts.21 experts uncover technical flaws and any obvious usability problems with a design,22 facilitate focus groups,23 and create new information architecture.24 because they are experts, however, their ways of thinking may not be the same as users.’25 research shows that 43 percent of the problems found by expert evaluators were actually false alarms and that 21 percent of users’ problems were missed by those evaluators. if this analysis is true, expert evaluators tend to miss and incorrectly identify more problems than they correctly identify;26 consequently, expert testing should not substitute for user testing.27 another problem with secondary sources is that web committees “are ignorant about integrating design with usability and focus on their own agenda.”28 nonetheless, because of the lack of available resources to conduct more rigorous usability tests and the difficulty of collecting information directly from users, secondary sources such as expert evaluations are commonly used.29 input from users user input provides a great advantage for directly finding out users’ needs and integrating a user-centered design during the development stage.30 often, information from secondary sources makes assumptions about users’ needs.31 to discover users’ genuine needs, designers can conduct a regular user survey and/or seek out users’ input.32 by surveying users’ needs, one can overcome criticism such as, “most websites are created with assumptions of more expert knowledge than the users may actually possess,” and can address users’ needs more effectively.33 discovering users’ needs goes beyond usability testing because information obtained directly the same industry face similar problems or issues, mimetic decisions can reduce uncertainty and secure legitimacy.9 in this context, website designers may analyze and emulate other universities’ websites to claim that their websites are congruent with successful websites, thereby justifying their managerial practices. normative force is associated with professionalism.10 normative force occurs when the norms (e.g., equity, democracy, etc.) of the professional community are integrated into organizational decision-making. in a library setting, website designers may follow a set of value systems or go to conferences to discover ways to better deliver services. there is evidence that website designers follow other organizations.11 this phenomenon is known as isomorphism. the appearance and the structure of websites show isomorphic patterns when an organization follows examples of other organizations’ websites or conforms to institutional pressures.12 another study reports coercive forces in the design of university library websites; the parent institution exercises power over library website design by providing guidelines, and later, the design is not independent.13 supervisors’ technical knowledge and support literature on supervisors’ knowledge of and support for technology has long been recognized as one of the most important technology success factors.14 if supervisors are knowledgeable about technology, they are likely to support and provide resources for training.15 supervisors’ technical knowledge also serves as a signal for the importance of the utilization of technology within the organization; consequently, employees actively look for ways to utilize technology and vigorously adopt technology.16 figure 1. conceptual model for website design success factors affecting university library website design | kim 101 march and may 2009. a total of 315 responses were collected (139 males and 176 female; 148 undergraduates, 101 master ’s, and 66 doctoral/faculty; business 152, human relations 51, psychology 43, engineering 41, education 20, other 8). because detailed discussion of the user side of this sample appears elsewhere,36 it will not be repeated here to avoid redundancy. because sparse research has been done in this area, the questionnaire and its measurements were created based on literature relating to the successful deployment of technology, but they were modified to fit into the website design context. because of this modification, the finalized instrument was pretested and pilot tested before use in this study.37 the institutional forces are measured in three categories: coercive isomorphism (i.e., following the university guidelines regarding website creation), mimetic isomorphism (i.e., investigating other university websites and investigating commercial websites), and normative isomorphism (i.e., attending conferences). following existing studies, supervisors’ knowledge and support are assessed by the web designer in two areas: the extent to which a supervisor is knowledgeable about technology and aware of the importance of technology. the supervisor ’s support for the website is measured by asking web designers about the extent to which their supervisors allocated resources and offered training. input from secondary sources is measured by asking the extent to which website designers consult sources such as experts, other divisions, webmasters, and web committees. input from users is measured by the extent to which web designers collect information from website users. finally website successes are measured by two categories: assessments made by the web designers and the website users themselves. the finalized measurements and the sources appear in table 1. ■■ report of findings this section reports the empirical findings of each category discussed in the previous section. figure 2 shows institutional forces that influence university library website design. the first category is coercive force, the second category is mimetic forces, and the third category is normative force. it is clear that the majority of university library web designers (75 percent) comply with the guidelines given by the university, which is a measurement of coercive force; and also designers investigate other universities’ websites (75 percent) and commercial websites (59 percent), which is a measurement of mimetic forces; however, designers don’t appear to actively attend conferences that influence website design, which is a measurement of normative force. from users will reveal what users want and what should be done to meet their needs, thereby enhancing ulwr usage. however, research shows that this aspect is not actively integrated into web design due to the lack of support from supervisors.34 website success success can be measured according to the website’s purpose: to what extent does the website meet users’ needs? in the university library website context, following a utilitarian perspective, researchers measured the success by the degree of ulwr integrated into users’ tasks and users’ frequent visits to the website.35 these two measurements, when combined with the designers’ perceptions of success, will allow one to measure the users’ and designers’ perspectives of website success. by measuring from these two sides, if there is a discrepancy between the two success outcomes, it will prompt designers to adjust their viewpoints to align their success measures with users. ■■ research methods this section discusses the sampling strategies and the measurements for the independent and the dependent variables. because one of the contributions of this study is to compare users’ and designers’ perceptions of website success, the samples are drawn from two groups: one is from university library website designers and the other one is from university library users. for the designer side, it is directly collected from university library website designers; later, libraries without website designers within the library are excluded. the designer sample is identified from the publicly available yahoo academic library list (http://dir.yahoo.com/ reference/libraries). the list contains 448 academic libraries, including those outside the united states. the research assistant made a phone call to the libraries that reside in the united states and verified the existence of website designers within the library, which included 86 academic libraries. if a library had a website designer, the research assistant contacted the person and invited him or her to participate in the study. because of difficulties contacting website designers, the research assistant was able to collect 16 responses between may 2009 and february 2010. once the graduate assistant identified the unreachable designers, the researcher e-mailed those designers between january and april of 2010 and added 30 more responses to the dataset, which resulted in a total of 46 responses (a 54 percent response rate). for the user side, a survey questionnaire was sent to faculty, doctoral, master ’s, and undergraduate students between 102 information technology and libraries | september 2011 the second group of factors that affects website design is supervisors’ knowledge about technology and support for the utilization of technology (see figure 3). web designers have a somewhat mixed perception about their supervisors’ technical knowledge. more specifically, 37 percent of respondents responded that their supervisors do not have good knowledge about technology; 23 percent responded that their supervisors were somewhat knowledgeable about technology; and 40 percent responded that their supervisors have good knowledge about technology; thus, web designers have mixed evaluations about supervisors’ technical knowledge. web designers reported that their supervisors’ perceptions of the importance of technology and websites are higher than their technical knowledge. approximately 60 percent of designers responded that their supervisors emphasize the importance of technology and websites, and the remaining respondents answered that their supervisors are somewhat aware of the importance or do not value it at all. table 1. instrument construct operationalization source institutional forces following university guidelines regarding website creations investigating other university websites investigating commercial websites attending conferences 11, 12, 15 supervisor’s technical knowledge and support supervisor’s knowledge about technology supervisor’s evaluation of the importance of technology supervisor’s evaluation of the importance of website utilization availability of website tools availability of budgeting availability of technical training availability of website creation training 17, 22 input from secondary sources consulting with experts consulting with other divisions within the library consulting with webmasters consulting with website committee consulting with focus group 10, 25–26 input from users conducting user survey utilizing users’ inputs 10 website success measures from web designer we meet users’ needs we provide better services via the website we satisfy users’ needs we provide quality services our library is overall successful 1, 2 website success measures from website users it lets me finish my project more quickly it helps improve my productivity it helps enhance the quality of my project the extent to which users integrate website library resources into users’ tasks* frequency of users’ visits to university library website** 3, 41, 43 all items are measured with a likert scale: 1 not really; 2: somewhat; and 3: greatly. * measured by percentage **measured by frequency figure 2. institutional forces factors affecting university library website design | kim 103 percent of respondents reported that they consult with web experts; over 70 percent responded that they integrate input from other divisions; and around 70 percent consult with webmasters. the utilization of secondary information sources for website creation is very high except for focus groups. the most widely used technique in this category is expert consultations followed by consultations with other divisions within the library. web designers also consider input from webmasters and web committees. figure 6 shows the extent to which website designers apply input directly derived from web users. around half of respondents reported that they obtain information from user surveys, and around 70 percent responded that they consider users’ input collected via comments, feedback, and complaints. figure 4 shows the extent to which supervisors support web designers. fifty-five percent of respondents reported that they have good web creation tools; 44 percent responded that they have enough budget for website creation, and almost a similar rate of respondents (39 percent) reported that they do not have adequate budgets for website creation. the last two questions concerning training show somewhat different results from the findings of the first two questions. the majority of web designers do not get technology-related or website creation-related training. less than one-third of respondents reported that they receive enough technology-related and web creationrelated training. the findings of the use of secondary sources show in figure 5 that web designers actively leverage such information sources for web design. by category, over 80 figure 3. supervisor’s knowledge about technology figure 4. supervisor’s support figure 5. input from secondary sources figure 6. input from users 104 information technology and libraries | september 2011 majority of users rely on commercial web resources for their academic tasks. ■■ discussion based on the study’s findings, this discussion will first cover the most influential factors first followed by the least influential elements in designing a university library website. first, the most influential factors for website designers are expert opinions and consultations with other divisions within the library. these may be the most important factors because relying on experts allows designers to discover users’ needs while saving costs. web designers also consider input from webmasters and web committees. coercive and mimetic forces are also highly significant factors affecting web designers. the university library is a subset of the university, and thus, designers may need to align themselves with university policy. also, designers can claim legitimacy by imitating other successful university websites, thereby securing necessary resources and support for website creation; however, web designers are much less likely to imitate commercial websites. this finding is consistent with existing reports that organizations imitate other successful organizations’ managerial practices that are within the same industry category.38 the least influential website creation factors are supervisors’ knowledge, which in turn impacts low budget allocations, and web designers’ technical training. this finding is consistent with successful technology deployment literature that shows supervisors’ technical knowledge is highly correlated with budget allocations.39 the lack of training for web designers does not appear to be improved since the last study, which was conducted in 2001;40 library ■■ website success website success is evaluated from two sides: designer opinion and user opinion. overall, designers evaluated their websites to be highly successful. they believe that they meet users’ needs, provide better services via the web, satisfy users’ needs, and provide quality services. later, their evaluation of their website is extremely positive, as reported in figure 7. figure 8 shows users’ perceptions of the usefulness of ulwr. users generally agree that ulwr are useful for their academic projects. more specifically, 55 percent responded that they are able to finish their tasks quickly because of the resources; 65 percent reported that they could increase their productivity; and 67 percent responded that they enhanced project quality thanks to the resources. on the other hand, a significant portion of respondents (more than 30 percent) do not think or have no opinions that ulwr are useful for their academic tasks. figure 9 investigates how often users visit university library websites. approximately 30 percent reported that they never visited or rarely visited the university library website. thirty-two percent made a visit to the website a couple of times a month, and approximately 40 percent visited the library website a couple of times a week or daily. figure 10 examines the users’ utilization of ulwr versus commercial website resources. the responses from 315 users show that they utilize commercial websites more than ulwr. specifically, 46 percent of respondents reported that they use less than 20 percent of ulwr and only 8 percent utilize ulwr more than 80 percent. in contrast, 14 percent utilize less than 20 percent of commercial website resources, and 22 percent utilize more than 80 percent of commercial website resources. the figure 7. website success evaluated by design figure 8. users’ perceptions of website usefulness factors affecting university library website design | kim 105 from a utilitarian perspective, web designers primarily need to consider the ability of the website to meet users’ needs. usefulness again needs to be evaluated by users. according to user assessments ulwr are somewhat satisfactory but not strong enough to rely heavily on for academic projects. it is an alarming fact that users use commercial website resources at a much higher rate than ulwr. this is somewhat disturbing given that web designers strive to provide good services to users, and libraries have invested massive resources into providing online services. this study has implications for academia and practitioners. for academia, there has been sparse research on web design studies from a designer standpoint. it may be because of difficulties in collecting data directly from website designers. from this line of research, this study enhances the understanding of what factors influence university web design. although university websites may be deemed successful, information managers should discover why the majority of users turn to commercial websites for their academic projects. without addressing this problem, the existence of library websites may be compromised. although there is evidence that libraries consider user input, it may not accurately represent all user populations because only extremely satisfied or extremely dissatisfied users tend to provide feedback;43 consequently, a regular survey may facilitate the utilization of ulwr. finally, supervisors’ technical knowledge is found to be low. this problem may be alleviated as time goes on because new generations are more aware of the importance of technology. in the meantime, web designers are encouraged to actively communicate with supervisors about the value of the utilization of technology and seek more financial support. this study’s data have some limitations. although the web designers are usually self-taught rather than formally trained.41 one promising finding, though, is that despite the relatively low technical knowledge held by supervisors, the respondents tend to rank highly when it comes to their perceptions of the importance of technology. compared with other institutional forces, normative force is relatively low. this kind of institutional force is higher at the early stage of technology adoption. in other words, the majority of universities have already launched their websites and have established rules and policies, so libraries are already past this stage. also, input from user surveys is relatively low. this may be because it is very costly, and they have other sources to turn to such as other universities’ successful websites. website success evaluations by web designers and users show discrepancies. overall, web designers evaluate their websites to be highly successful, while user ratings offer a different picture. this incongruity is a red flag in terms of ulwr usage. the majority of users report that they turn to commercial websites more than ulwr, and one-third never or rarely visit the university website. the disparity of the success between web designers and users may be attributed to the sources of information that website designers rely on. more specifically, existing studies report that input from experts and website committees is incongruent with what users really want, while feedback from focus groups can assist in understanding users’ needs.42 ■■ conclusions this study investigates the factors that website designers consider when designing university library websites. figure 9. frequency of visits to university library websites figure 10. university library vs. commercial website 106 information technology and libraries | september 2011 seriously in information systems research,” mis quarterly 29, no. 4 (2005): 591–605. 9. scott, institutions and organizations; dimaggio and powell, “the iron cage revisited”; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 10. scott, institutions and organizations. 11. k. lee, dinesh mirchandani, and xinde zhang, “an investigation on institutionalization of web sites of firms,” the data base for advances in information systems 41, no. 2 (2010): 70–88. 12. lee, mirchandani, and zhang, “an investigation on institutionalization of web sites of firms.” 13. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 14. y-m. kim, an investigation of the effects of it investment on firm performance: the role of complementarity (saarbrucken, germany: vdm verlag, 2008); p. weill, “the relationship between investment in information technology and firm performance: a study of the valve manufacturing sector,” information systems research 3, no. 4 (1992): 307–33. 15. a. lederer and v. sethi, “the implementation of strategic information systems planning methodologies,” mis quarterly (1988): 445–461; j. thong, c. yap, and k. raman, “top management support, external expertise and information systems implementation in small business,” information systems research 7, no. 2 (1996): 248–67; m. earl, “experiences in strategic information systems planning,” mis quarterly (1993): 1–24; a. boynton and r. zmud, “information technology planning in the 1990’s: directions for practice and research,” mis quarterly 11, no. 1 (1987): 59–72. 16. s. jarvenpaa and b. ives, “information technology and corporate strategy: a view from the top,” information systems research 1, no. 4 (1990): 351–76. 17. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 18. ibid. 19. j. veldof and s. nackerud, “do you have the right stuff? seven areas of expertise for successful web site design in libraries,” internet reference services quarterly 6, no. 1 (2001): 20. 20. chen, germain, yang, “an exploration into the practices of library web usability in arl academic libraries”; r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36; j. bobay et al., “working with consultants to test usability: the indiana university bloomington experience,” in usability assessment of library-related web sites: methods and case studies, ed. n. campbell (chicago: ala, 2002): 60–76; h. king and c. jannik, “redesigning for usability: information architecture and usability testing for georgia tech library’s website,” oclc systems & services 21, no. 3 (2005): 235–43. 21. j. h. spyridakis, j. b. barrick, and e. cuddihy, “internetbased research: providing a foundation for web-design guidelines,” ieee transactions on professional communication 48, no. 3 (2005): 242–60; t. a. powell, web design: the complete reference (berkeley, calif.: osborne/mcgraw-hill, 2002). 22. powell, web design. 23. r. tolliver, d. carter, and s. chapman, “website redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–66; l. vandecreek, author tried to increase responses using various means, the number of responses does not allow one to use a sophisticated analytical technique such as regression. this study includes academic libraries with a web designer within the library; as a consequence, libraries without a web designer are not included. it is recommended to collect data from both groups and compare those with a designer (resource rich) and without a designer (resource poor), and discover underlying patterns of the factors impacting website designs and offer implications for academia and managers. references 1. d. v. parboteeah, j. s. valacich and j. d. wells, “the influence of website characteristics on a consumer’s urge to buy impulsively,” information systems research 20, no. 1 (2009): 60–78; m-h. huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 2. y-m. kim, “the adoption of university library web site resources: a multigroup analysis,” journal of the american society for information science & technology 61, no. 5 (2010): 978–93; o. nov and c. ye, “users’ personality and perceived ease of use of digital libraries: the case for resistance to change,” journal of the american society for information science & technology 59 (2008): 845–51; n. park et al., “user acceptance of a digital library system in developing countries: an application of the technology acceptance model” international journal of information management 29, no. 3 (2009): 196–209. 3. w. hong et al., “determinants of user acceptance of digital libraries: an empirical examination of individual differences and system characteristics,” journal of management information systems 18, no. 3 (2001–2): 97–124. 4. parboteeah, valacich and wells, “the influence of website characteristics; j. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151–67. 5. c. burton, “library web site user testing,” collect & undergraduate libraries 9, (2002): 10; s. ryan, “library web site administration: a strategic planning model for the smaller academic library,” journal of academic librarianship 29, no. 4 (2003): 207–18; y-h chen, c.a. germain., and h. yang, “an exploration into the practices of library web usability in arl academic libraries,” journal of the american society for information science and technology 60, no. 5 (2009): 953–68. 6. m-h huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 7. w. r. scott, institutions and organizations (thousand oaks, calif.: sage publications, inc, 1995); p. dimaggio and w. powell, “the iron cage revisited: institutional isomorphism and collective rationality in organizational fields,” american sociological review 48 (1983): 147–60. 8. w. r. scott, institutions and organizations; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627; m. w. chiasson and e. davidson,” taking industry factors affecting university library website design | kim 107 “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; j. ward, “web site redesign: the university of washington libraries’ experience,” oclc systems & services 22, no. 3 (2006): 207–16. 32. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 33. ibid. 34. kim, “the adoption of university library web site resources.” 35. ibid. 36. ibid. 37. y-m. kim, “validation of psychometric research instruments: the case of information science,” journal of the american society for information science & technology 60, no. 6 (2009): 1178–91. 38. h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 39. t. teo and j. ang, “an examination of major is planning problems,” information journal of information management 21 (2001): 457–70. 40. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 41. ibid. 42. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries”; powell, web design; b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http:// www.humanfactors.com/downloads/jan01.asp (accessed june 15, 2011). 43. t. hennig-thurau et al., “electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?” journal of interactive marketing 18, no. 1 (2004): 38–52. “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 24. spyridakis, barrick, and cuddihy, “internet-based research.” 25. b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http://www.humanfactors .com/downloads/jan01.asp (accessed june 10, 2011). 26. powell, web design. 27. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 28. k.a. saeed, y. hwang, and v. grover, “investigating the impact of web site value and advertising on firm performance in electronic commerce,” international journal of electronic commerce 7, no. 2 (2003): 119–41. 29. l. manzari and j. trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163–69. 30. e. abels, m. white, and k. hahn, “identifying user-based criteria for web pages,” internet research 7, no. 4 (1997): 252–56. 31. l. vandecreek, “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; m. ascher, h. lougee-heimer, and d. cunningham, “approaching usability: a study of an academic health sciences library web site,” medical reference services quarterly 26, no. 2 (2007): 37–53; b. battleson, a. booth and j. weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188– 98; g. h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” journal of academic librarianship 28, no. 4 (2002): 205–10; b. thomsett-scott and f. may, “how may we help you? online education faculty tell us what they need from libraries and librarians,” journal of library administration 49, no. 1/2 (2009): 111–35; d. turnbow et al., usability test results for a discovery tool in an academic library jody condit fagan meris mandernach carl s. nelson jonathan r. paulo grover saunders information technology and libraries | march 2012 83 abstract discovery tools are emerging in libraries. these tools offer library patrons the ability to concurrently search the library catalog and journal articles. while vendors rush to provide feature-rich interfaces and access to as much content as possible, librarians wonder about the usefulness of these tools to library patrons. to learn about both the utility and usability of ebsco discovery service, james madison university (jmu) conducted a usability test with eight students and two faculty members. the test consisted of nine tasks focused on common patron requests or related to the utility of specific discovery tool features. software recorded participants’ actions and time on task, human observers judged the success of each task, and a post–survey questionnaire gathered qualitative feedback and comments from the participants. participants were successful at most tasks, but specific usability problems suggested some interface changes for both ebsco discovery service and jmu’s customizations of the tool. the study also raised several questions for libraries above and beyond any specific discovery-tool interface, including the scope and purpose of a discovery tool versus other library systems, working with the large result sets made possible by discovery tools, and navigation between the tool and other library services and resources. this article will be of interest to those who are investigating discovery tools, selecting products, integrating discovery tools into a library web presence, or performing evaluations of similar systems. introduction discovery tools appeared on the library scene shortly after the arrival of next-generation catalogs. the authors of this paper define discovery tools as web software that searches journal-article and library-catalog metadata in a unified index and presents search results in a single interface. this differs from federated search software, which searches multiple databases and aggregates the results. examples of discovery tools include serials solutions summon, ebsco discovery service, jody condit fagan (faganjc@jmu.edu) is director, scholarly content systems, meris mandernach (manderma@jmu.edu) is collection management librarian, carl s. nelson (nelsoncs@jmu.edu) is digital user experience specialist, jonathan r. paulo (paulojr@jmu.edu) is education librarian, and grover saunders (saundebn@jmu.edu) is web media developer, carrier library, james madison university, harrisonburg, va. mailto:faganjc@jmu.edu mailto:manderma@jmu.edu mailto:nelsoncs@jmu.edu mailto:paulojr@jmu.edu mailto:saundebn@jmu.edu usability test results for a discovery tool in an academic library | fagan et al 84 ex libris primo, and oclc worldcat local; examples of federated search software include serials solutions webfeat and ebsco integrated search. with federated search software, results rely on the search algorithm and relevance ranking as well as each tool’s algorithms and relevance rankings. discovery tools, which import metadata into one index, apply one set of search algorithms to retrieve and rank results. this difference is important because it contributes to a fundamentally different user experience in terms of speed, relevance, and ability to interact consistently with results. combining the library catalog, article indexes, and other source types in a unified interface is a big change for users because they no longer need to choose a specific search tool to begin their search. research has shown that such a choice has long been in conflict with users’ expectati ons.1 federated search software was unable to completely fulfill users’ expectations because of its limited technology.2 now that discovery tools provide a truly integrated search experience, with greatly improved relevance rankings, response times, and increased consistency, libraries can finally begin to meet this area of user expectation. however, discovery tools present new challenges for users: will they be able to differentiate between source types in the integrated results sets? will they be able to limit large results sets effectively? do they understand the scope of the tool and that other online resources exist outside the tool’s boundaries? the sea change brought by discovery tools also raises challenges for librarians, who have grown comfortable with the separation between the library catalog and other online databases. discovery tools may mask important differences between disciplinary searching, and they do not currently offer discipline-specific strategies or limits. they also lack authority control, which makes topical precision a challenge. their usual prominence on library websites may direct traffic away from carefully cultivated and organized collections of online resources. discovery tools offer both opportunities and challenges for library instruction, depending on the academic discipline, users’ knowledge, and information-seeking need. james madison university (jmu) is a predominantly undergraduate institution of approximately 18,000 students in virginia. jmu has a strong information literacy program integrated into the curriculum through the university’s information seeking skills test (isst). the isst is completed before students are able to register for third-semester courses. additionally, the library provides an information literacy tutorial, “go for the gold,” that supports the skills needed for the isst. jmu launched ebsco discovery service (eds) in august 2010 after participating as a beta development partner in spring and summer 2010. as with other discovery tools, the predominant feature of eds is integration of the library catalog with article databases and other types of sources. at the time of this study, eds had a few differentiating features. first, because of ebsco’s business as a database and journal provider, article metadata was drawn from a combination of journal-publisher information and abstracts and index records. the latter included robust subject indexing (e.g., the medical subject headings in cinahl). the content searched by eds varies by information technology and libraries | march 2012 85 institution according to the institution’s subscription. jmu had a large number of ebsco databases and third-party database subscriptions through ebsco, so the quantity of information searched by eds at jmu is quite large. eds also allowed for extensive customization of the tool, including header navigation links, results-screen layout, and the inclusion of widgets in the right-hand column of the results screen. jmu libraries developed a custom “quick search” widget based on eds for the library home page (see figure 1), which allows users to add limits to the discovery-tool search and assists with local authentication requirements. based on experience with a pilot test of the open-source vufind next-generation catalog, jmu libraries believed users would find the ability to limit up-front useful, so quick search’s first drop-down menu contained keyword, title, and author field limits; the second drop-down contained limits for books, articles, scholarly articles, “just leo library catalog,” and the library website (which did not use eds). the “just leo library catalog” option limited the user’s search to the library catalog database records but used the eds interface to perform the search. to access the native catalog interface, a link to leo library catalog was included immediately above the search box as well as in the library website header. figure 1. quick search widget on jmu library homepage usability test results for a discovery tool in an academic library | fagan et al 86 evaluation was included as part of the implementation process for the discovery t ool, and therefore a usability test was conducted in october 2010. the purpose of the study was to explore how patrons used the discovery tool, to uncover any usability issues with the chosen system and to investigate user satisfaction. specific tasks addressed the use of facets within the discovery tool, patrons’ use of date limiters, and the usability of the quick search widget. the usability test also had tasks in which users were asked to locate books and articles using only the discovery tool, then repeat the task using anything but the discovery tool. this article interprets the usability study’s results in the context of other local usability tests and web-usage data from the first semester of use. some findings were used to implement changes to quick search and the library website, and to recommend changes to ebsco; however, other findings suggested general questions related to discovery tool software that libraries will need to investigate further. literature review literature reviewed for this article included some background reading on users and library catalogs, library responses to users’ expectations, usability studies in libraries, and usability studies of discovery tools specifically. the first group of articles comprised a discussion about the limitations of traditional library catalogs. the strengths and weaknesses of library catalogs were reported in several academic libraries’ usability studies.3 calhoun recognized that library users’ preference for google caused a decline in the use and value of library catalogs, and encouraged library leaders to “establish the catalog within the framework of online information discovery systems.” 4 this awareness of changes in user expectations during a time when google set the benchmark for search simplicity was echoed by numerous authors who recognized the limits of library catalogs and expressed a need for the catalog to be greatly modernized to keep pace with the evolution of the web. 5 libraries have responded in several ways to the call for modernization, most notably through investigations related to federated searching and next-generation catalogs. several articles have presented usability studies results for various federated searching products.6 fagan provided a thorough literature review of faceted browsing and next-generation catalogs.7 western michigan university presented usability study results for the next-generation catalog vufind, revealing that participants took advantage of the simple search box but did not use the next-generation catalog features of tagging, comments, favorites, and sms texting. 8 the university of minnesota conducted two usability studies of primo and reported that participants were satisfied with using primo to find known print items, limit by author and date, and find a journal title.9 tod olson conducted a study with graduate students and faculty using the aquabrowser interface, and his participants located sources for their research they had not previously been able to find.10 information technology and libraries | march 2012 87 the literature also revealed both opportunities and limitations of federated searching and nextgeneration catalogs. allison presented statistics from google analytics for an implementation of encore at the university of nebraska-lincoln. 11 the usage statistics revealed an increased use of article databases as well as an increased use of narrowing facets such as format and media type, and library location. allison concluded that encore increased users’ exposure to the entire collection. breeding concluded that federated searching had various limitations, especially search speed and interface design, and was thus unable to compete with google scholar. 12 usability studies of next-generation catalogs revealed a lack of features necessary to fully incorporate an entire library’s collection. breeding also recognized the limitations of next-generation library catalogs and saw discovery tools as their next step in evolution: “it’s all about helping users discover library content in all formats, regardless of whether it resides within the physical library or among its collections of electronic content, spanning both locally owned materials and those accessed remotely through subscriptions.” 13 the dominant literature related to discovery tools discussed features,14 reviewed them from a library selector perspective,15 summarized academic libraries’ decisions following selection, 16 presented questions related to evaluation after selection,17 and offered a thorough evaluation of common features.18 allison concluded that “usability testing will help clarify what aspects need improvement, what additions will make [the interface] more useful, and how the interface can be made so intuitive that user training is not needed.”19 breeding noted “it will only be through the experience of library users that these products will either prove themselves or not.”20 libraries have been adapting techniques from the field of usability testing for over a decade to learn more about user behavior, usability, and user satisfaction, with library web sites and systems. 21 rubin and chisnell and dumas and redish provided an authoritative overview of the benefits and best practices of usability testing. 22 in addition, campbell and norlin and winters offered specific usability methodologies for libraries.23 worldcat local has dominated usability studies of discovery tools published to date. ward, shadle, and mofield conducted a usability study at the university of washington. 24 although the second round of testing was not published, the first round involved seven undergraduate and three graduate students; its purpose “was to determine how successful uw students would be in using worldcat local to discover and obtain books and journal articles (in both print and electronic form) from the uw collection, from the summit consortium, and from other worldcat libraries.” 25 although participants were successful at completing these tasks, a few issues arose out of the usability study. users had difficulty with the brief item display because reviews were listed higher than the actual items. the detailed item display also hindered users’ ability to decipher between various editions and formats. the second round of usability testing, not yet published, included tasks related to finding materials on specific subject areas. usability test results for a discovery tool in an academic library | fagan et al 88 boock, chadwell, and reese conducted a usability study of worldcat local at oregon state university.26 the study included four tasks and five evaluative questions. forty undergraduate students, sixteen graduate students, twenty-four library employees, four instructors, and eighteen faculty members took part in the study. they summarized that users found known-title searching to be easier in the library catalog but found topical searches to be more effective in worldcat local.the participants preferred worldcat local for the ability to find articles and search for materials in other institutions. western washington university also conducted a usability study of worldcat local. they selected twenty-four participants with a wide range of academic experience to conduct twenty tasks in both worldcat local and the traditional library catalog.27 the comparison revealed several problems in using worldcat local, including users’ inability to determine the scope of the content, confusion over the intermixing of formats, problems with the display of facet option, and difficulty with known-item searches. western washington university decided not to implement worldcat local. oclc published a thorough summary of several usability studies conducted mostly with academic libraries piloting the tool, including the university of washington; the university of california (berkeley, davis, and irvine campuses); ohio state university; the peninsula library system in san mateo, california; and the free library of urbana and the des plaines public library, both in illinois.28 the report conveyed favorable user interest in searching local, group, and global collections together. users also appreciated the ability to search articles and books together. the authors commented, “however, most academic participants in one test (nine of fourteen) wrongly assumed that journal article coverage includes all the licensed content available at their campuses.”29 oclc used the testing results to improve the order of search results, provide clarity about various editions, improve facets for narrowing a search, provide links to electronic resources, and increase visibility of search terms. at grand valley state university, doug way conducted an analysis of usage statistics after implementing the discovery tool summon in 2009; the usage statistics revealed an increased use of full-text downloads and link resolver software but a decrease in the use of core subject databases.30 the usage statistics showed promising results, but way recommended further studies of usage statistics over a longer period of time to better understand how discovery tools affect entire library collections. north carolina state university libraries released a final report about their usability study of summon.31 the results of these usability studies were similar to other studies of discovery tools: users were satisfied with the ability to search the library catalog and article databases with a single search, but users had mixed results with known-item searching and confusion about narrowing facets and results ranking. although several additional academic libraries have conducted usability studies of encore, summon, and ebsco discovery service, the results have not yet been published.32 information technology and libraries | march 2012 89 only one usability study of ebsco discovery service was found. in a study with six participants, williams and foster found users were satisfied and able to adapt to the new system quickly but did not take full advantage of the rich feature set.33 combined with the rapid changes in these tools, the literature illustrates a current need for more usability studies related to discovery tools. the necessary focus on specific software implementations and different study designs make it difficult to identify common themes. additional usability studies will offer greater breadth and depth to the current dialogue about discovery tools. this article will help fill the gap by presenting results from a usability study of ebsco discovery service. publishing such usability results of discovery tools will inform institutional decisions, improve user experiences, and advance the tools’ content, features, and interface design. in addition, libraries will be able to more thoroughly modernize library catalogs to meet users’ changing needs and expectations as well as keep pace with the evolution of the web. method james madison university libraries’ usability lab features one workstation with two pieces of usability software: techsmith’s morae (version 3) (http://www.techsmith.com/morae.asp), which records screen captures of participant actions during the usability studies, and the usability testing environment (ute) (version 3), which presents participants with tasks in a web-browser environment. the ute also presents end-of-task questions to measure time on task and task success. the study of eds, conducted in october 2010, was covered by an institutional review board – approved protocol. participants were recruited for the study through a bulk email sent to all students and faculty. interested respondents were randomly selected to include a variety of grade levels and majors for students and years of service and disciplines taught for faculty members. the study included ten participants with ranging levels of experience: two freshman, two sophomores, two juniors, one senior, one graduate student, and two faculty members. three of the participants were from the school of business, one from education, two from the arts and humanities, and two from the sciences. the remaining two participants had dual majors in the humanities and the sciences. a usability rule of thumb is that at least five users will reveal more than 75 percent of usability issues.34 because the goal was to observe a wide range of user behaviors and usability issues, and to gather data about satisfaction from a variety of perspectives, this study used two users of each grade level plus two faculty participants (for a total of ten) to provide as much heterogeneity as possible. student participants were presented with ten pre–study questions, and faculty participants were asked nine pre–study questions (see appendix a). the pre–study questions were intended to http://www.techsmith.com/morae.asp usability test results for a discovery tool in an academic library | fagan et al 90 gather information about participants’ background, including their time at jmu, their academic discipline, and their experience with the library website, the ebscohost interface, the library catalog, and library instruction. since participants were anonymous, we hoped their answers would help us interpret unusual comments or findings. pre–test results were not used to form comparison groups (e.g., freshmen versus senior) because these groups would not be representative of their larger populations. these questions were followed by a practice task to help familiarize participants with the testing software. the study consisted of nine tasks designed to showcase usability issues, show the researchers how users behaved in the system, and measure user satisfaction. appendix b lists the tasks and what they were intended to measure. in designing the test, determining success on some tasks seemed very objective (find a video about a given topic) while others appeared to be more subjective (those involving relevance judgments). for this reason, we asked participants to provide satisfaction information on some tasks and not others. in retrospect, for consistency of interpretation, we probably should have asked participants to rate or comment on every task. all of the tasks were presented in the same order. tasks were completed either by clicking “answer” and answering a question (multiple choice or typed response), or by clicking “finished” after navigating to a particular webpage. participants also had the option to skip the task they were working on and move to the next task. allowing participants to skip a task helps differentiate between genuinely incorrect answers and incorrect answers due to participant frustration or guessing. a time limit of 5 minutes was set for tasks 1–7, while tasks 8 and 9 were given time limits of 8 minutes, after which the participant was timed out. time limits were used to ensure participants were able to complete all tasks within the agreed-upon session. average time on task across all tasks was 1 minute, 35 seconds. after the study was completed, participants were presented with the system usability scale (sus), a ten-item scale using statements of subjective assessment and covering a variety of aspects of system usability.35 sus scores, which provide a numerical score out of 100, are affected by the complexity of both the system and the tasks users may have performed before taking the sus. the sus was followed by a post–test consisting of six open-ended questions, plus one additional question for faculty participants, intended to gather more qualitative feedback about user satisfaction with the system (see appendix a). a technical glitch with the ute software affected the study in two ways. first, on seven of the ninety tasks, the ute failed to enforce the five-minute maximum time limit, and participants exceeding a task’s time limit were allowed to continue the task until they completed or skipped the task. one participant exceeded the time limit on task 1 while three of these errors occurred during both tasks 8 and 9. this problem potentially limits the ability to compare the average time on task across tasks; however, since this study used time on task in a descriptive rather than comparative way, the impact on interpreting results is minimal. the seven instances in which the glitch occurred were included in the average time on task data found in figure 3 because the times information technology and libraries | march 2012 91 were not extreme and the time limit had been imposed mostly to be sure participants had time to complete all the tasks. a second problem with the ute was that it randomly and prematurely aborted some users’ tasks; when this happened, participants were informed that their time had run out and were then moved on to the next task. this problem is more serious because it is unknown how much more time or effort the participant would have spent on the task or whether they would have been more successful. because of this, the results below specify how many participants were affected for each task. although this was unfortunate, the results of the participants who did not experience this problem still provide useful cases of user behavior, especially because this study does not attempt to generalize observed behavior or usability issues to the larger population. although a participant mentioned a few technical glitches during testing to the facilitator, the extent of software errors was not discovered until after the tests were complete (and the semester was over) because the facilitator did not directly observe participants during sessions. results the participants were asked several pre–test questions to learn about their research habits. all but one participant indicated they used the library website no more than six times per month (see figure 2). common tasks this study’s student participants said they performed on the website were searching for books and articles, searching for music scores, “research using databases,” and checking library hours. the two faculty participants mentioned book and database searches, electronic journal access, and interlibrary loan. participants were shown the quick search widget and were asked “how much of the library’s resources do you think the quick search will search?” seven participants said “most”; only one person, a faculty member, said it would search “all” the library’s resources. figure 2. monthly visits to library website < 1 visit (2) 1 3 visits (4) 4 6 visits (3) > 7 visits (1) usability test results for a discovery tool in an academic library | fagan et al 92 when shown screenshots of the library catalog and an ebscohost database, seven participants were sure they had used leo library catalog, and three were not sure. three indicated that they had used an ebsco database before, five had not, and two were not su re. participants were also asked how often they had used library resources for assignments in their major field of study; four said “often,” two said “sometimes,” one “rarely/never,” and one “very often.” students were also asked “has a librarian spoken to a class you’ve attended about library research?” and two said yes, five said no, and one was not sure. a “practice task” was administered to ensure participants were comfortable with the workstation and software: “use quick search to search a topic relating to your major/discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results?” no one selected “no opinion” or very unsatisfied”; sixty percent were “very satisfied” or “satisfied” with their results; forty percent were “somewhat unsatisfied.” figure 3 shows the time spent on each task, while figure 4 describes participants’ success on the tasks. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 no. of responses (not including timeouts) 10 9 5 7 9 10 10 8 10 avg. time on task (in seconds) 175* 123 116 97 34 120 92 252* 255* standard deviation 212 43 50 49 26 36 51 177 174 *includes time(s) in excess of the set time limit. excess time allowed by software error. figure 3. average time spent on tasks 175 123 116 97 34 120 92 292 255 0 50 100 150 200 250 300 350 task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 t im e o n t a sk ( in s e co n d s) average time for all tasks (not including timeouts) information technology and libraries | march 2012 93 the first task (“what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this.”) started participants on the library homepage. participants were then asked to “tell us how this compared to your previous experience” using a text box. the average time on task was almost 2 minutes; however one faculty participant took more than 12 minutes on this task; if his or her time was removed, the time on task average was 1 minute, 23 seconds. figure 5 shows the participants’ search terms and their comments. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 how success determined users only asked to provide feedback valid typed-in response provided how many subtasks completed (out of 3) how many subtasks completed (out of 2) correct multiple choice answer how many subtasks completed (out of 2) end task at correct web location how many subtasks complete d (out of 4) how many subtasks completed (out of 4) p01 n/a correct 3 2 timeout 2 correct 0* 0** p02 n/a correct 3* 1 correct 2 correct 0** 3 p03 n/a correct 0* 1 incorrect 2 correct 4 3 p04 n/a correct 2 0* correct 2 skip 3 2 p05 n/a correct* 2 2 correct 1 correct 4 2 p06 n/a correct 3* 1 correct 1 correct 3 0** p07 n/a correct 2 1* correct 1 correct 0 2 p08 n/a correct 2 0* correct 0 skip timeout 0** p09 n/a correct 2* skip correct 2 correct 4 2 p10 n/a correct 1* 1 correct 2 skip 4 2 note: “timeout” indicates an immediate timeout error. users were unable to take any action on the task. *user experienced a timeout error while working on the task. this may have affected their ability to complete the task. **user did not follow directions. figure 4. participants’ success on tasks usability test results for a discovery tool in an academic library | fagan et al 94 participant jmu status major/discipline search terms p01 faculty geology large low shear wave velocity province comments: ebsco did a fairly complete job. there were some irrelevant results that i don’t remember seeing when i used georef. p02 faculty computer information systems & management science (statistics) student cheating comments: this is a topic that i am somewhat familiar with the related literature. i was pleased with the diversity of journals that were found in the search. the topics of the articles was right on target. the recency of the articles was great. this is a topic for which i am somewhat familiar with the related literature. i was impressed with the search results regarding: diversity of journals; recency of articles; just the topic in articles i was looking for. p03 graduate student education death of a salesman comments: there is a lot of variety in the types of sources that quick search is pulling up now. i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as “death of a salesman criticism.” p04 1st year voice performance current issues in russia comments: it was somewhat helpful in the way that it gave me information about what had happened in the past couple months, but not what was happening now in russia. p05 3rd year nursing uninsured and health care reform comments: the quick search gave very detailed articles i thought, which could be good, but were not exactly what i was looking for. then again, i didn’t read all these articles either p06 1st year history headscarf law comments: this search yielded more results related to my topic. i needed other sources for an argument on the french creating law banning religious dress and symbols in school. using other methods with the same keyword, i had an enormous amount of trouble finding articles that pertained to my essay. p07 3rd year english jung comments: i like the fact that it can be so defined to help me get exactly what i need. p08 4th year spanish restaurant industry comments: this is about the same as the last time that i researched this topic. p09 2nd year hospitality aphasia comments: there are many good sources, however there are also completely irrelevant sources. p10 2nd year management rogers five types of feedback comments: there is not many documents on the topic i searched for. this may be because the topic is not popular or my search is not specific/too specific. figure 5. participants’ search terms and comments information technology and libraries | march 2012 95 the second task started on the library homepage and asked participants to find a video related to early childhood cognitive development. this task was chosen because jmu libraries have significant video collections and because the research team hypothesized users might have trouble because there was no explicit way to limit to videos at the time. the average time on this task was two minutes, with one person experiencing an arbitrary time out by the software. participants were judged to be successful on this task by the researchers if they found any video related to the topic. all participants were successful on this task, but four entered, then left the discovery tool interface to complete the task. five participants looked for a video search option in the drop-down menu, and of these, three immediately used something other than quick search when they saw that there was no video search option. of those who tried quick search, six opened the source type facet in eds search results and four selected a source type limit, but only two selected a source type that led directly to success (“non-print resources”). task 3 started participants in eds (see figure 6) and asked them to search on speech pathology, find a way to limit search results to audiology, and limit their search results to peer-reviewed sources. participants spent an average of 1 minute, 40 seconds on this task, with five participants being artificially timed out by the software. participants’ success on this task was determined by the researchers’ examination of the number of subtasks they completed. the three subtasks consisted of successfully searching for the given topic (speech language pathology) limiting the search results to audiology, and further limiting the results to peer reviewed sources. four participants were able to complete all three subtasks, including two who were timed out. (the times for those who were timed out were not included in time on task averages, but they were given credit for success.) five completed just two of the subtasks, failing to limit to peerreviewed; one of these because of a timeout. it was unclear why the remaining participants did not attempt to alter the search results to “peer reviewed.” looking at the performed actions, six of the ten typed “and audiology” into search keywords to narrow the search results, while one found and used “audiology” in the subject facet on the search results page. six participants found and used the “scholarly (peer reviewed) journals” checkbox limiter. usability test results for a discovery tool in an academic library | fagan et al 96 figure 6. ebsco discovery service interface beginning with the results they had from task 3, task 4 asked participants to find more recent sources and to select the most recent source available. task success was measured by correct completion of two subtasks: limiting the search results to the last five years and finding the most recent source. the average time on task was 1 minute, 14 seconds, with three artificial timeouts. of those who did not time out, all seven were able to limit their sources to be more recent in some way, but only three were able to select the most recent source. in addition to this being a common research task, the team was interested to see how users accomplished this task. three typed in the limiter in the left-hand column, two typed in the limiter on the advanced search screen, and two used the date slider. two participants used the “sort” drop-down menu to change the sort order to “date descending,” which helped them complete this task. other participants changed the dates, and then selected the first result, which was not the most recent. task 5, which started within eds, asked participants to find a way to ask a jmu librarian for help. the success of this task was measured by whether they reached the correct url for the ask-a information technology and libraries | march 2012 97 librarian page; eight of the ten participants were successful. this task took an average of only 31 seconds to complete, and eight of the ten used the ask-a-librarian link at the top of the page. of the two unsuccessful participants, one was timed out, while another clicked “search modes” for no apparent reason, then clicked back and decided to finish the task. task 6 started in the eds interface and asked participants to locate the journal yachting and boating world and select the correct coverage dates and online status from a list of four options; participants were deemed successful at two subtasks if they selected the correct option and successful at one subtask if they chose an option that was partially correct. participants took an average of two minutes on this task; only five answered correctly. during this task, three participants used the ebsco search option “so journal title/source,” four used quotation marks, and four searched or re-searched with the “title” drop-down menu option. three chose the correct dates of coverage, but were unable to correctly identify the online availability. it is important to note that only searching and locating the journal title were accomplished with the discovery tool; to see dates of coverage and online availability, users clicked jmu’s link resolver button, and the resulting screen was served from serials solutions’ article linker product. although some users spent more time than perhaps was necessary using the eds search options to locate the journal, the real barriers to this task were encountered when trying to interpret the serials solutions screen. task 7, where participants started in eds, was designed to determine whether users could navigate to a research database outside of eds. users were asked to look up the sculpture genius of mirth and were told the library database camio would be the best place to search. they were instructed to “locate this database and find the sculpture.” the researcher observed the recordings to determine success on this task, which was defined as using camio to find the sculpture. participants took an average of 1 minute, 32 seconds on this task; seven were observed to complete the task successfully, while three chose to skip the task. to accomplish this task, seven participants used the jmu research databases link in the header navigation at some point, but only four began the task by doing this. six participants began by searching within eds. the final two tasks started on the library homepage and were a pair: participants were asked to find two books and two recent, peer-reviewed articles (from the last five years) on rheumatoid arthritis. task 8 asked them to use the library’s eds widget, quick search, to accomplish this, and task 9 asked them to accomplish the same task without using quick search. when they found sources, they were asked to enter the four relevant titles in a text-entry box. the average time spent on these tasks was similar: about four minutes per task. comparing these tasks was somewhat confusing because some participants did not follow instructions. user s uccess was determined by the researchers’ observation of how many of the four subtasks the user was able to complete successfully: find two books, find two articles, limit to peer reviewed, and select articles from last five years (with or without using a limiter); figure 4 shows their success. usability test results for a discovery tool in an academic library | fagan et al 98 looking at the seven users who used quick search on the quick search tasks, six limited to “scholarly (peer reviewed) journals”; six limited to the last five years; and seven narrowed results using the source type facet. the average number of subtasks completed on task eight was 3.14 out of 4. looking at the seven users who followed instructions and did not use quick search on task 9, all began with the library catalog and tried to locate articles within the library catalog. the average number of subtasks completed on task 9 was 2.29 out of 4. some users tried to locate articles by setting the catalog’s material type drop-down menu to “periodicals” and others used the catalog’s “periodical” tab, which performed a title keyword search of the e-journal portal. for task 9, only two users eventually chose a research database to find articles. user behavior can only be compared for the six users (all students) who followed instructions on both tasks; a summary is provided in figure 4. after completing all nine tasks, participants were presented with the system usability scale. eds scored 56 out of 100. following the sus, participants were asked a series of post–test questions. only one of the faculty members chose to answer the post–test questions. when asked how they would use quick search, all eight students explicitly mentioned class assignments, and the participating faculty member replied “to search for books.” two students mentioned books specifically, while the rest used the more generic term “sources” to describe items for which they would search. when asked “when would you not use this search tool?” the faculty member said “i would just have to get used to using it. i mainly go to [the library catalog] and then research databases.” responses from the six students who answered this question were vague and hard to categorize: • “not really sure for more general question/learning” • “when just browsing” • “for quick answers” • “if i could look up the information on the internet” • “when the material i need is broad” • “basic searching when you do not need to say where you got the info from” when asked for the advantages of quick search, four specifically mentioned the ability to narrow results, three respondents mentioned “speed,” three mentioned ease of use, and three mentioned relevance in some way (e.g., “it does a pretty good job associating keywords with sources”). two mentioned the broad coverage and one compared it to google, “which is what students are looking for.” when asked to list disadvantages, the faculty member mentioned he/she was not sure what part of the library home page was actually “quick search,” and was not sure how to get to his/her library account. three students talked about quick search being “overwhelming” or “confusing” because of the many features, although one of these also stated, “like anything you need to learn in order to use it efficiently.” one student mentioned the lack of an audio recording limit and another said “when the search results come up it is hard to tell if they are usable results.” information technology and libraries | march 2012 99 knowing that quick search may not always provide the best results, the research team also asked users what they would do if they were unable to find an item using quick search. a faculty participant said he or she would log into the library catalog and start from there. five students mentioned consulting a library staff member in some fashion. three mentioned moving on from library resources, although not necessarily as their first step. one said “find out more information on it to help narrow down my search.” only one student mentioned the library catalog or any other specific library resource. when participants were asked if “quick search” was an appropriate name, seven agreed that it was. of those who did not agree, one participant’s comment was “not really, though i don’t think it matters.” and another’s was “i think it represents the idea of the search, but not the action. it could be quicker.” the only alternative name suggestion was “search tool.” web traffic analysis web traffic through quick search and in eds provides additional context for this study’s results. during august–december 2010, quick search was searched 81,841 times from the library homepage. this is an increase from traffic into the previous widget in this location that searched the catalog, which received 41,740 searches during the same period in 2009. even adjusting for an approximately 22 percent increase in website traffic from 2009 to 2010, this is an increase of 75 percent. interestingly, the traffic to the most popular link on the library homepage, research databases, went from 55,891 in 2009 to 30,616 in 2010, a decrease of 55 percent when adjusting for the change in website traffic. during fall 2010, 28 percent of quick search searches from the homepage were executed using at least one drop-down menu. twelve percent changed quick search’s first drop-down menu to something other than the keyword default, with “title” being the most popular option (7 percent of searches) followed by author (4 percent of searches). twenty percent of users changed the second drop-down option; “just articles” and “just books” were the most popular options, garnering 7 percent and 6 percent of searches, respectively, followed by “just scholarly articles,” which accounted for 4 percent of searches. looking at ebsco’s statistical reports for jmu’s implementation of eds, there were 85,835 sessions and approximately 195,400 searches during august–december 2010. this means about 95 percent of eds sessions were launched using quick search from the homepage. there were an average of 2.3 searches per session, which is comparable to past behavior in jmu’s other ebscohost databases. discussion usability test results for a discovery tool in an academic library | fagan et al 100 the goal of this study was to gather initial data about user behavior, usability issues, and user satisfaction with discovery tools. the task design and technical limitations of the study mean that comparing time on task between participants or tasks would not be particularly illuminating; and, while the success rates on tasks are interesting, they are not generalizable to the larger jmu population. instead, this study provided observations of user behavior that librarians can use to improve services, it suggested some “quick fixes” to usability issues, and it pointed to several research questions. when possible, these observations are supplemented by comparisons between this study and the only other published usability study of eds.36 this study confirmed a previous finding of user studies of federated search software and discovery tools: students have trouble determining what is searched by various systems.37 on the tasks in which they were asked to not use quick search to find articles, participants tried to search for articles in the library catalog. although all but one of this study’s participants correctly answered that quick search did not search “all” library resources, seven thought it searched “most.” both “most” or “some” would be considered correct; however, it is interesting that answering this question more specifically is challenging even for librarians. many journals in subject article indexes and abstracts are included in the eds foundation index; furthermore, jmu’s implementation of eds includes all of jmu’s ebsco subscription resources as well, making it impractical to assemble a master list of indexed titles. of course, there are numerous online resources with contents which may never be included in a discovery tool, such as political voting records, ethnographic files, and financial data. users often have access to these resources through their library. however, if they do not know the library has a database of financial data, they will certainly not consider this content in their response to a question of how many of the library resources are included in the discovery tool. as discovery tools begin to fulfill users’ expectations for a “single search,” libraries will need to share best practices for showcasing valuable, useful collections that fall outside the discovery tool’s scope or abilities. this is especially critical when reviewing the 72 percent increase in homepage traffic to the homepage search widget compared with the 55 percent decrease in homepage traffic to the research databases page. it is important to note these trends do not mean the library’s other research databases have fallen in usage by 55 percent. though there was not a comprehensive examination of usage statistics, spot-checking suggested ebsco and non-ebsco subject databases had both increases and decreases in usage from previous years. another issue libraries should consider, especially when preparing for instruction classes, is that users do not seem to understand which information needs are suited to a discovery tool versus the catalog or subject-specific databases. several tasks provided additional information about users’ mental models of the tool, which may help libraries make better decisions about navigation customizations in discovery tool interfaces and on library websites. task 7 was designed to discover whether users could find their way to a database outside of eds if they knew they needed to use a specific database. six participants, including one of the faculty members, began by searching eds for the name of the sculpture and/or the database name. on task 1, a graduate information technology and libraries | march 2012 101 student who searched on “death of a salesman” and was asked to comment on how quick search results compared to his or her previous experience, said, “i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as ‘death of a salesman criticism.’” while true, most librarians would suggest using a literary criticism database, which would target this information need. librarians may have differing opinions regarding the best research starting point, but their rationale would be much different than that of the students in this study. this study’s participants said they would use quick search/eds when they were doing class work or research, but would not use it for general inquiries. if librarians were to list which user information needs are best met by a discovery tool versus a subject-specific database, the types of information needs listed would be much more numerous and diverse, regardless of differences over how to classify them. in addition to helping users choose between a discovery tool or a subject-specific database, libraries will need to conceptualize how users will move in and out of the discovery tool to other library resources, services, and user accounts. while users had no trouble finding the ask-alibrarian link in the header, it might have been more informative if users started from a searchresults page to see if they would find the right-hand column’s ask-a-librarian link or links to library subject guides and database lists. discovery tools vary in their abilities to connect users with their online library accounts and are changing quickly in this area. this study also provided some interesting observations about discovery tool interfaces. the default setting for ebsco discovery service is a single search box. however, this study suggests that while users desire a single search, they are willing to use multiple interface options. this was supported by log analysis of the library’s locally developed entry widget, quick search, in which 28 percent of searches included the use of a drop-down menu. on the first usability task, users left quick search’s options set to the default. on other tasks, participants frequently used the dropdown menus and limiters in both quick search and eds. for example, on task 2, which asked them to look for videos, five users looked in the quick search format drop-down menu. on the same task within eds, six users attempted to use the source type facet. use of limiters was similarly observed by williams and foster in their eds usability study.38 one eds interface option that was not obvious to participants was the link to change the sort order. when asked to find the most recent article, only two participants changed the sort option. most others used the date input boxes to limit their search, then selected the first result even thought it was not the most recent one. it is unclear whether the participant assumed the first result was the most recent or whether they could not figure out how to display the most recent sources. finding a journal title from library homepages has long been a difficult task,39 and this study provided no exception, even with the addition of a discovery tool. it is important to note that the standard eds implementation would include a “publications” or “journals a–z” link in the header; usability test results for a discovery tool in an academic library | fagan et al 102 in eds, libraries can customize the text of this link. jmu did not have this type of link enabled in our test, since the hope was that users could find journal titles within the eds results. however, neither eds nor the quick search widget’s search interfaces offered a way to limit the search to a journal title at the time of this study. during the usability test, four participants changed the field search drop-down menu to “title” in eds, and three participants changed the eds field search drop-down menu to “so journal title/source,” which limits the search to articles within that journal title. while both of these ideas were good, neither one resulted in a precise results set in eds for this task unless the user also limited to “jmu catalog only,” a nonintuitive option. since the test, jmu has added a “journal titles” option to quick search that launches the user’s search into the journal a–z list (provided by serials solutions). in two months after the change (february and march 2011), only 391 searches were performed with this option. this was less than 1 percent of all searches, indicating that while it may be an important task, it is not a popular one. like many libraries with discovery tools, jmu added federated search capabilities to eds using ebscohost integrated search software in an attempt to draw some traffic to databases not included in eds (or not subscribed to through ebsco by jmu), such as mla international bibliography, scopus, and credo reference. links to these databases appeared in the upper-righthand column of eds during the usability study (see figure 6.) usage data from ebsco showed that less than 1 percent of all jmu’s eds sessions for fall 2010 included any interaction with this area. likewise, williams and foster observed their participants did not use their federated search until explicitly asked to do so.40 perhaps users faced with discovery tool results simply have no motivation to click on additional database results. since the usability test, jmu has replaced the right-hand column with static links to ask-a-librarian, subject guides, and research database lists. readers may wonder why one of the most common tasks, finding a specific book title, was not included in this usability study; this was because jmu libraries posed this task in a concurrent homepage usability study. on that study, twenty of the twenty-five participants used quick search to find the title “pigs in heaven” and choose the correct call number. eleven of the twenty used the quick search drop-down menu to choose a title search option, further confirming users’ willingness to limit up-front. the average time on this task was just under a minute, and all participants completed this task successfully, so this task was not repeated in the eds usability test. other studies have reported trouble with this type of task;41 much could depend on the item chosen as well as the tool’s relevance ranking. user satisfaction with eds can be summarized from the open-ended post–study questions, from the responses to task 1 (figure 5), and the sus scale. answers to the post–study questions indicated participants liked the ability to narrow results, the speed and ease of use, and relevance of the system. a few participants did describe the system as being “overwhelming” or “confusing” because of the many features, which was also supported by the sus scores. jmu has been using the sus to understand the relative usability of library systems. the sus offers a benchmark for system improvement; for example, ebsco discovery service received an sus of only 37 in spring 2010 (n information technology and libraries | march 2012 103 = 7) but a 56 on this study in fall 2010 (n = 10). this suggests the interface has become more usable. in 2009, jmu libraries also used the sus to test the library catalog’s classic interface as well as a vufind interface to the library catalog, which received scores of 68 (n = 15) and 80 (n = 14), respectively. the differences between the catalog scores and eds indicate an important distinction between usability and usefulness, with the latter concept encompassing a system’s content and capabilities. the library catalog is, perhaps, a more straightforward tool than a discovery tool and attempts to provide access to a smaller set of information. it has none of the complexity involved in finding article-level or book chapter information. all else being equal, simpler tools will be more usable. in an experimental study, tsakonas and paptheodorou found that while users did not distinguish between the concepts of usability and usefulness, they prefer attributes composing a useful system in contrast to those supporting usability.42 discovery tools, which support more tasks, must make compromises in usability that simpler systems can avoid. in their study of eds, williams and foster also found overall user satisfaction with eds. their participants made positive comments about the interface as well as the usefulness and relevance of the results.43 jmu passed on several suggestions to ebsco related to eds based on the test results. ebsco subsequently added “audio” and “video” to the source types, which enabled jmu to add a “just videos at jmu” option to quick search. while it is confusing that “audio” and “video” source types currently behave differently than the others in eds, in that they limit to jmu’s catalog as well as to the source type, this behavior produces what most local users expect. a previous usability study of worldcat local showed users have trouble discriminating between source types in results lists, so the source types facet is important.44 another piece of feedback provided to ebsco was that on the task where users needed to choose the most recent result, only two of our participants sorted by date descending. perhaps the textual appearance of the sort option (instead of a drop-down menu) was not obvious to participants (see figure 6); however, williams and foster did not observe this to be an issue in their study.45 future research the findings of this study suggest many avenues for future research. libraries will need to revisit the scope of their catalogs and other systems to keep up with users’ mental models and information needs. catalogs and subject-specific databases still perform some tasks much better than discovery tools, but libraries will need to investigate how to situate the discovery tool and specialized tools within their web presence in a way that will make sense to users. when should a user be directed to the catalog versus a discovery tool? what items should libraries continue to include in their catalogs? what role do institutional repositories play in the suite of library tools, and how does the discovery tool connect to them (or include them?) how do library websites begin to make sense of the current state of library search systems? above all, are users able to find the best resources for their research needs? although research on searchers’ mental models has been extensive,46 librarians’ mental models have not been studied as such. yet placing the usability test results for a discovery tool in an academic library | fagan et al 104 discovery tool among the library’s suite of services will involve compromises between these two models. another area needing research is how to instruct users to work with the large numbers of results returned by discovery tools. in subject-specific databases, librarians often help users measure the success of their strategy—or even their topic—by the number of results returned: in criminal justice abstracts, 5,000 results means a topic is too broad or the search strategy needs refinement. in a discovery tool, a result set this large will likely have some good results on the first couple of pages if sorted by relevance; however, users will still need to know how to grow or reduce their results sets. participants in this study showed a willingness to use limiters and other interface features, but not always the most helpful ones. when asked to narrow a broad subject on task 3 of this study, only one participant chose to use the “subject” facet even when the subtopic, audiology, was clearly available. most added search terms. it will be important for future studies to investigate the best way for users to narrow large results set in a discovery tool. this study also suggested possible areas of investigation for future user studies. one interesting finding related to this study’s users’ information contexts was that when users were asked to search on their last research topic, it did not always match up with their major: a voice performance student searched on “current issues in russia,” and the hospitality major searched on “aphasia.” to what extent does a discovery tool help or hinder students who are searching outside their major area of study? one of jmu’s reference librarians noted that while he would usually teach a student majoring in a subject how to use that subject’s specific indexes, as opposed to a discovery tool, a student outside the major might not need to learn the subject-specific indexes for that subject and could be well served by the discovery tool. future studies could also investigate the usage and usability of discovery tool features in order to continue informing library customizations and advice to vendors. for example, this study did not have a task related to logging into a patron account or requesting items, but that would be good to investigate in a follow-up study. another area ripe for further investigation is discovery tool limiters. this study’s participants frequently attempted to use limiters, but didn’t always choose the correct ones for the task. what are the ideal design choices for making limiters intuitive? this study found almost no use of the embedded federated search add-on: is this true at other institutions? finally, this study and others reveal difficulty in distinguishing source types. development and testing of interface enhancements to support this ability would be helpful to many libraries’ systems. conclusion this usability test of a discovery tool at james madison university did not reveal as many interface-specific findings as it did questions about the role of discovery tools in libraries. users were generally able to navigate through the quick search and eds interfaces and complete tasks successfully. tasks that are challenging in other interfaces, such as locating journal articles and discriminating between source types, continued to be challenging in a discovery tool interface. information technology and libraries | march 2012 105 this usability test suggested that while some interface features were heavily used, such as drop down limits and facets, other features were not used, such as federated search results. as discovery tools continue to grow and evolve, libraries should continue to conduct usability tests, both to find usability issues and to understand user behavior and satisfaction. although discovery tools challenge libraries to think not only about access but also about the best research pathways for users, they provide users with a search that more closely matches their expectations. acknowledgement the authors would like to thank patrick ragland for his editorial assistance in preparing this manuscript. correction april 12, 2018: at the request of the author, this article was revised to remove a link to a website. references 1. emily alling and rachael naismith, “protocol analysis of a federated search tool: designing for users,” internet reference services quarterly 12, no. 1 (2007): 195, http://scholarworks.umass.edu/librarian_pubs/1/ (accessed jan. 11, 2012); frank cervone, “what we've learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10 ; sara randall, “federated searching and usability testing: building the perfect beast,” serials review 32, no. 3 (2006): 181–82, doi:10.1016/j.serrev.2006.06.003; ed tallent, “metasearching in boston college libraries —a case study of user reactions,” new library world 105, no. 1 (2004): 69-75, doi: 10.1108/03074800410515282. 2. s. c. williams and a. k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011), http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 (accessed jan. 11, 2012). 3. janet k. chisman, karen r. diller, and sharon l. walbridge, “usability testing: a case study,” college & research libraries 60, no. 6 (november 1999): 552–69, http://crl.acrl.org/content/60/6/552.short (accessed jan. 11, 2012); frances c. johnson and jenny craven, “beyond usability: the study of functionality of the 2.0 online catalogue,” new review of academic librarianship 16, no. 2 (2010): 228–50, doi: 10.1108/00012531011015217 (accessed jan, 11, 2012); jennifer e. knievel, jina choi wakimoto, and sara holladay, “does interface design influence catalog use? a case study,” college & research libraries 70, no. 5 (september 2009): 446–58, http://crl.acrl.org/content/70/5/446.short (accessed jan. 11, 2012); jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (march 2008): 5–22, http://0http://scholarworks.umass.edu/librarian_pubs/1/ http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 http://crl.acrl.org/content/60/6/552.short http://crl.acrl.org/content/70/5/446.short http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf usability test results for a discovery tool in an academic library | fagan et al 106 www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf (accessed jan. 11, 2012). 4. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed mar. 11, 2011). 5. dee ann allison, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed january 11, 2012); marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34; c. p diedrichs, “discovery and delivery: making it work for users . . . taking the sting out of serials!” (lecture, north american serials interest group, inc. 23rd annual conference, phoenix, arizona, june 5–8, 2008), doi: 10.1080/03615260802679127; ian hargraves, “controversies of information discovery,” knowledge, technology & policy 20, no. 2 (summer 2007): 83, http://www.springerlink.com/content/au20jr6226252272/fulltext.html (accessed jan. 11, 2012); jane hutton, “academic libraries as digital gateways: linking students to the burgeoning wealth of open online collections,” journal of library administration 48, no. 3 (2008): 495–507, doi: 10.1080/01930820802289615; oclc, “online catalogs: what users and librarians want: an oclc report,” http://www.oclc.org/reports/onlinecatalogs/default.htm (accessed mar. 11 2011). 6. c. j. belliston, jared l. howland, and brian c. roberts, “undergraduate use of federated searching: a survey of preferences and perceptions of value-added functionality,” college & research libraries 68, no. 6 (november 2007): 472–86, http://crl.acrl.org/content/68/6/472.full.pdf+html (accessed jan. 11, 2012); judith z. emde, sara e. morris, and monica claassen‐wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library & information practice 4, no. 4 (2009): 24– 36, http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf (accessed jan. 11,2012); karla saari kitalong, athena hoeppner, and meg scharf, “making sense of an academic library web site: toward a more usable interface for university researchers,” journal of web librarianship 2, no. 2/3 (2008): 177–204, http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 (accessed jan. 11, 2012); ed tallent, “metasearching in boston college libraries—a case study of user reactions,” new library world 105, no. 1 (2004): 69–75, doi: 10.1108/03074800410515282; rong tang, ingrid hsieh-yee, and shanyun zhang, “user perceptions of metalib combined search: an investigation of how users make sense of federated searching,” internet reference services quarterly 12, no. 1 (2007): 211–36, http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 (accessed jan. 11, 2012). 7. jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology & libraries 29, no. 2 (2010): 58–66, http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience http://www.springerlink.com/content/au20jr6226252272/fulltext.html http://www.oclc.org/reports/onlinecatalogs/default.htm http://crl.acrl.org/content/68/6/472.full.pdf+html http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 information technology and libraries | march 2012 107 http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf (accessed jan. 11, 2012). 8. birong ho, keith kelley, and scott garrison, “implementing vufind as an alternative to voyager’s web voyage interface: one library’s experience,” library hi tech 27, no. 1 (2009): 8292, doi: 10.1108/07378830910942946 (accessed jan. 11, 2012). 9. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1 (2008): 7–24, doi: 10.1108/03074800810845976 (accessed jan. 11, 2012). 10. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61, doi: 10.1108/07378830710840509 (accessed jan. 11, 2012). 11. allison, “information portals,” 375–89. 12. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27. 13. ibid. 14. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (2010): 60–62; dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (2010): 62–65, http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf (accessed jan. 11, 2012). 15. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (2010): 5–10; k. stevenson et al., “next-generation library catalogues: reviews of encore, primo, summon and summa,” serials 22, no. 1 (2009): 68–78. 16. jason vaughan, “chapter 7: questions to consider,” library technology reports 47, no. 1 (2011): 54; paula l. webb and muriel d. nero, “opacs in the clouds,” computers in libraries 29, no. 9 (2009): 18. 17. jason vaughan, “investigations into library web scale discovery services,” articles (libraries), paper 44 (2011), http://digitalcommons.library.unlv.edu/lib_articles/44. 18. marshall breeding, “the state of the art in library discovery,” 31–34; sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 19. allison, “information portals,” 375–89. 20. breeding, “the state of the art in library discovery,” 31–34. 21. galina letnikova, “usability testing of academic library websites: a selective bibliography,” internet reference services quarterly 8, no. 4 (2003): 53–68. http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf http://digitalcommons.library.unlv.edu/lib_articles/44 usability test results for a discovery tool in an academic library | fagan et al 108 22. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); joseph s. dumas and janice redish, a practical guide to usability testing, rev. ed. (portland, or: intellect, 1999). 23. nicole campbell, ed., usability assessment of library-related web sites: methods and case studies (chicago: library & information technology association, 2001); elaina norlin and c. m. winters, usability testing for library web sites: a hands-on guide (chicago: american library association, 2002). 24. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (2008): 17. 25. ibid. 26. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,” http://hdl.handle.net/1957/11167 (accessed mar. 11 2011). 27. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 28. oclc, “some findings from worldcat local usability tests prepared for ala annual,” http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf (accessed mar. 11, 2011). 29. ibid., 2. 30. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4 (2010): 21420. 31. north carolina state university libraries, “final summon user research report,” http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ (accessed mar. 28, 2011). 32. alesia mcmanus, “the discovery sandbox: aleph and encore playing together,” http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf (accessed mar. 28, 2011); prweb, “deakin university in australia chooses ebsco discovery service,” http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm (accessed mar. 28, 2011); university of manitoba, “summon usability: partnering with the vendor,” http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor (accessed mar. 28, 2011). 33. williams and foster, “promise fulfilled?” 34. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed aug. 20, 2011). 35. john brooke, “sus: a ‘quick and dirty’ usability scale,” in usability evaluation in industry, ed. p. w. jordanet al. (london: taylor & francis, 1996), http://www.usabilitynet.org/trump/documents/suschapt.doc (accessed apr. 6, 2011). 36. williams and foster, “promise fulfilled?” http://hdl.handle.net/1957/11167 http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor/ http://www.useit.com/alertbox/20000319.html http://www.usabilitynet.org/trump/documents/suschapt.doc information technology and libraries | march 2012 109 37. seikyung jung et al., “libraryfind: system design and usability testing of academic metasearch system,” journal of the american society for information science & technology 59, no. 3 (2008): 375–89; williams and foster, “promise fulfilled?”; laura wrubel and kari schmidt, “usability testing of a metasearch interface: a case study,” college & research libraries 68, no. 4 (2007): 292–311. 38. williams and foster, “promise fulfilled?” 39. letnikova, “usability testing of academic library websites,” 53–68; tom ipri, michael yunkin, and jeanne m. brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181–86; susan h. mvungi, karin de jager, and peter g. underwood, “an evaluation of the information architecture of the uct library web site,” south african journal of library & information science 74, no. 2 (2008): 171–82. 40. williams and foster, “promise fulfilled?” 41. ward et al., “user experience, feedback, and testing,” 17. 42. giannis tsakonas and christos papatheodorou, “analysing and evaluating usefulness and usability in electronic information services,” journal of information science 32, no. 5 (2006): 400– 419. 43. williams and foster, “promise fulfilled?” 44. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 45. williams and foster, “promise fulfilled?” 46. tracy gabridge, millicent gaskell, and amy stout, “information seeking through students’ eyes: the mit photo diary study,” college & research libraries 69, no. 6 (2008): 510–22; yan zhang, “undergraduate students’ mental models of the web as an information retrieval system,” journal of the american society for information science & technology 59, no. 13 (2008): 2087–98; brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (2004): 123–30; alexandra dimitroff, “mental models theory and search outcome in a bibliographic retrieval system,” library & information science research 14, no. 2 (1992): 141–56. usability test results for a discovery tool in an academic library | fagan et al 110 appendix a task pre–test 1: please indicate your jmu status (1st year, 2nd year, 3rd year, 4th year, graduate student, faculty, other) pre–test 2: please list your major(s) or area of teaching (open ended) pre–test 3: how often do you use the library website? (less than once a month, 1–3 visits per month, 4–6 visits per month, more than 7 visits per month) pre–test 4: what are some of the most common things you currently do on the library website? (open ended) pre–test 5: how much of the library’s resources do you think the quick search will search? (less than a third, less than half, half, most, all) pre–test 6: have you used leo? (show screenshot on printout) (yes, no, not sure) pre–test 7: have you used ebsco? (show screenshot on printout) (yes, no, not sure) pre–test 8 (student participants only): how often have you used library web resources for course assignments in your major? (rarely/never, sometimes, often, very often) pre–test 9 (student participants only): how often have you used library resources for course assignments outside of your major? (rarely/never, sometimes, often, very often) pre–test 10 (student participants only): has a librarian spoken to a class you've attended about library research? (yes, no, not sure) pre–test 11 (faculty participants only): how often do you give assignments that require the use of library resources? (rarely/never, sometimes, often, very often) pre–test 12 (faculty participants only): how often have you had a librarian visit one of your classes to teach your students about library research? (rarely/never, sometimes, often, very often) post–test 1: when would you use this search tool? post–test 2: when would you not use this search tool? post–test 3: what would you say are the major advantages of quick search? information technology and libraries | march 2012 111 post–test 4: what would you say are the major problems with quick search? post–test 5: if you were unable to find an item using quick search/ebsco discovery service what would your next steps be? post–test 6: do you think the name “quick search” is fitting for this search tool? if not, what would you call it? post–test 7 (faculty participants only): if you knew students would use this tool to complete assignments would you alter how you structure assignments and how? appendix b task purpose • practice task: use quick search to search a topic relating to your major / discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results? help users get comfortable with the usability testing software. also, since the first time someone uses a piece of software involves behaviors unique to that case, we wanted participants’ first use of eds to be with a practice task. 1. what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this. tell us how this compared to your previous experience. having participants re-search a topic with which they had some experience and interest would motivate them to engage with results and provide a comparison point for their answer. we hoped to learn about their satisfaction with relevance, quality, and quantity of results. (user behavior, user satisfaction) 2. using quick search find a video related to early childhood cognitive development. when you’ve found a suitable video recording, click answer and copy and paste the title. this task aimed to determine whether participants could complete the task, as well as show us which features they used in their attempts. (usability, user behavior) 3. search on speech pathology and find a way to limit your search results to audiology. then, limit your search results to peer reviewed sources. how satisfied are you with the results? since there are several ways to limit results in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. we also hoped to learn about whether they thought the limiters provided satisfactory results. (usability, user behavior, user satisfaction) usability test results for a discovery tool in an academic library | fagan et al 112 4. you need more recent sources. please limit these search results to the last 5 years, then select the most recent source available. click finished when you are done. since there are several ways to limit by date in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. (usability, user behavior) 5. find a way to ask a jmu librarian for help using this search tool. after you’ve found the correct web page, click finished. we wanted to determine whether the user could complete this task, and which pathway they chose to do it. (usability, user behavior) 6. locate the journal yachting and boating world. what are the coverage dates? is this journal available in online full text? we wanted to determine whether the user could locate a journal by title. (usability) 7. you need to look up the sculpture genius of mirth. you have been told that the library database, camio, would be the best place to search for this. locate this database and find the sculpture. we wanted to know whether users who knew they needed to use a specific database could find that database from within the discovery tool. (usability, user behavior). 8. use quick search to find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable source click answer and copy and paste the titles. click back to webpage if you need to return to your search results. these two tasks were intended to show us how users completed a common, broad task with and without a discovery tool, whether they would be more successful with or without the tool, and what barriers existed with and without the tool (usability, user behavior) 9. without using quick search, find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable sources click answer and copy and paste the titles. click back to webpage if you need to return to your search results. automating the diversity audit process public libraries leading the way automating the diversity audit process rachel k. fischer information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16925 rachel k. fischer (rfischer@ccslib.org) is the member services librarian at cooperative computer services, a public library consortium in illinois. © 2023. introduction i’ve frequently come across the buzzwords “mirrors and windows” at conference sessions or in articles on collection development and diversity audits. this striking metaphor refers to how books and other items in libraries are windows into other cultures and mirrors that reflect our own lives and experiences. if libraries’ collections and programs don’t properly reflect the diversity in america, and the whole world, the citizens of this country may not be able to have access to the materials that they need to gain appreciation for other cultures, genders, sexual orientations, or socio-economic statuses. minorities will continue to feel marginalized by not seeing themselves and their experiences reflected on them in the books they read and movies they watch. rudine sims bishop, the professor who first popularized the metaphor, stated, “when there are enough books available that can act as both mirrors and windows for all our children, they will see that we can celebrate both our differences and similarities, because together they are what makes us human.”1 to get to that point, the first step that libraries need to take is to analyze their collections, programs, toys, and policies by doing a diversity audit. the diversity audit can function as a steppingstone towards an improved collection development policy. what is a diversity audit? organizations have historically conducted audits of the diversity of their staff or their policies to avoid lawsuits and to promote systemic change. once a baseline has been established, the company is expected to make systemic changes to improve the policies and diversity of the staff. the concept of diversity audits is rather new to libraries. karen jensen first promoted the concept of a diversity audit of library collections in 2017 in articles published on school library journal’s blog, “teen librarian tool box.”2 since then, sarah voels, a librarian at the cedar rapids public library, published a book on the topic titled, auditing diversity in library collections.3 regarding library collections, a diversity audit is a methodology for analyzing the amount of diversity represented by the items in the library to establish a baseline as a benchmark. the audit consists of analyzing the diversity represented by subjects, fictional characters, authors, and illustrators of the items in the library collection. these statistics can be compared to population statistics to set goals for increasing the diversity of the collection. library programs and toys can also be analyzed. types of diversity that are typically analyzed include race or ethnicity, gender, sexual orientation, religion, and socio-economic status. there are two main manual methodologies that librarians can choose from. library staff and volunteers can review the titles being audited and record the data in spreadsheets. the statistics are tabulated in the spreadsheet. audits can be done of titles as they are purchased, randomly selected titles in a collection, or a whole collection. another option is called a “reverse audit.” this type of audit is accomplished by comparing award lists of diverse books, like the pura belpré award, to the collection and purchasing the titles that the library doesn’t own. mailto:rfischer@ccslib.org information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 2 fischer doing a manual audit of a full collection or a whole library can be very time-consuming. betsy bird, collection development manager at evanston public library, estimated that it took about 12 weeks to audit 18,508 titles in the adult fiction collection. if they were to attempt a manual audit of the whole library, it would take more than 6000 hours to audit 382,981 items. can you imagine the number of hours it would take to audit the 923,673 bibliographic records for physical items that could be audited in the entire shared database of the cooperative computer services (ccs) consortium, the consortium that evanston public library is a member of? to accomplish that, an automated process needed to be utilized. existing automated tools for diversity audits several vendors are already providing an automated diversity audit service. diverse bookfinder’s collection analysis tool (cat) is a free diversity audit tool for picture books. it is an award winning digital tool that’s financially supported by bates college and the institute of museum and library services. the director and founder of the website is associate dean of faculty and professor of psychology at bates college, dr. krista aronson. diverse bookfinder has collected and analyzed more than 3,000 picture books published since 2002 featuring black people, indigenous people, and people of color (bipoc). cat allows you to upload a list of isbns and titles. the file is compared to the diverse bookfinder (dbf) collection. cat produces a report that depicts the results of the analysis in graphs. the report explains how many titles match the dbf collection. the graphs describe the representation of ethnicities in the collection that matches the dbf collection and how they are represented, such as in a biography or folklore. although this service is free, its capabilities are limited to analyzing the titles that match the dbf collection. the two leading collection analysis tools, collectionhq and libraryiq, both include diversity audit reports. collectionhq is vendor neutral and owned by baker and taylor. it specializes in public libraries. libraryiq is also vendor neutral and can support all types of libraries and library consortia. both services can analyze the diversity of the collection. they each have a user -friendly interface and graphs that are easily understood. collectionhq is now offering customers of baker and taylor’s cataloging utility, btcat, the ability to add diversity, equity, and inclusion (dei) subject headings to bibliographic records as a bulk process. however, only libraryiq can suggest items to purchase to increase the diversity of the collection. in addition to baker and taylor, other major library vendors, such as ingram and midwest tape, include diversity audit services. both companies allow customers to purchase a one-time analysis of their collection(s). ingram’s report includes data for the whole collection compared to the public library average for comparison. a separate report includes suggestions of diverse titles to purchase to improve the diversity of the collection. midwest tape’s library collection diversity audit specifically audits video and audiobook collections. this service also produces a report analyzing the diversity of the collection and identifies areas to improve. however, the company utilizes a third-party to assist them in providing community demographic data to compare the library’s collection to local and national demographics. midwest tape’s service is also integrated with hoopla instant to help the customer fill in the gaps in diversity in the collection. cooperative computer services’ diversity audit tool not all libraries are able to spend money on a diversity audit service provided by vendors. however, libraries that already have a systems administrator who is well-versed in sql or other query languages that are used by their ils can create their own automated diversity audit tool. ccs has completed the creation of a diversity audit tool using tableau (see fig. 1). this tool information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 3 fischer analyzes the diversity of the physical items in the member libraries’ collections according to the subject headings of the bibliographic records and allows the libraries to benchmark against the whole consortium’s data or other libraries of similar demographics. the categories audited include women, bipoc, lgbtqia+, disabilities/neurodiversity, religious minorities, immigrants, and low income/economic welfare. like vendor provided diversity audits, the ccs diversity audit tool is limited to the data in the bibliographic record. only the subject of the titles can be audited, not the characteristics of characters or creators. the tool also includes a function to drill down into the data of narrower subcategories. in addition to the diversity audit dashboard, a collection development dashboard allows the librarians to identify popular titles on diverse topics that are owned by other libraries that they don’t own yet. this makes the selection and cataloging processes more efficient because the library already knows that these books will be checked out and the bibliographic record already exists. for an introduction to the tool, check out the video at: https://youtu.be/nonp2mgssuo. information about the sql query that is used to pull the data from the database and build the tables that tableau uses to analyze the data can be found at: https://reports.ccslib.org/divdoc. figure 1. ccs diversity audit tool conclusion auditing the diversity of a whole library collection is possible with the use of automated diversity audit tools. with a growing number of vendors offering diversity audit services that are integrated into their collection development and sales platforms, it is becoming increasingly easier for selectors to identify diverse titles to add to library collections. it, systems administrators, and even catalogers without selector duties, don’t have to sit on the sidelines. they can become active contributors by working within libraries and consortia to create their own diversity audit tools using tableau, google data studio, or even excel. although homegrown solutions have limitations, being able to analyze an entire library or consortium automatically can greatly improve the efficiency of the diversity audit process and supplement the manual methodology. https://youtu.be/nonp2mgssuo https://reports.ccslib.org/divdoc information technology and libraries september 2023 public libraries leading the way: automating the diversity audit process 4 fischer endnotes 1 rudine sims bishop, “mirrors, window, and sliding glass doors,” perspectives: choosing and using books for the classroom 6, no. 3 (1990), https://scenicregional.org/wpcontent/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf. 2 karen jensen, “doing a ya collection diversity audit: understanding your local community (part 1),” teen librarian toolbox, school library journal, november 1, 2017, https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-yourlocal-community/. 3 sarah voels, auditing diversity in library collections (santa barbara: ca: libraries unlimited, 2022). https://scenicregional.org/wp-content/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf https://scenicregional.org/wp-content/uploads/2017/08/mirrors-windows-and-sliding-glass-doors.pdf https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-your-local-community/ https://teenlibrariantoolbox.com/2017/11/01/doing-a-diversity-audit-understanding-your-local-community/ introduction what is a diversity audit? existing automated tools for diversity audits cooperative computer services’ diversity audit tool conclusion endnotes improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides kate a. pittsley and sara memmott information technology and libraries | se ptember 2012 52 abstract can the navigation of complex research websites be improved so that users more often find their way without intermediation or instruction? librarians at eastern michigan university discovered both anecdotally and by looking at patterns in usage statistics that some students were not recognizing navigational elements on web-based research guides, and so were not always accessing secondary pages of the guides. in this study, two types of navigation improvements were applied to separate sets of online guides. usage patterns from before and after the changes were analyzed. both sets of experimental guides showed an increase in use of secondary guide pages after the changes were applied whereas a comparison group with no navigation changes showed no significant change in usage patterns. in this case, both duplicate menu links and improvements to tab design appeared to improve independent student navigation of complex research sites. introduction anecdotal evidence led librarians at eastern michigan university (emu) to investigate possible navigation issues related to the libguides platform. anecdotal evidence included (1) incidents of emu librarians not immediately recognizing the tab navigation when looking at implementations of the libguides platform on other university sites during the initial purchase evaluation, (2) multiple encounters with students at the reference desk who did not notice the tab navigation, and (3) a specific case involving use of a guide with an online course. the case investigation started with a complaint from a professor that graduate students in her online course were suddenly using far fewer resources than students in the same course during previous semesters. the students in that semester’s section relied heavily—often solely— on one database, while most students during previous semesters had used multiple research sources. this course has always relied on a research guide prepared by the liaison librarian, the selection of resources provided had not changed significantly between the semesters, and the assignment had not changed. furthermore, the same professor taught the course and did not alter her recommendation to the students to use the resources on the research guide. what had changed between the semesters was the platform used to present research guides. the library had just migrated from a simple one-page format for research guides to the more flexible multipage format offered by the libguides platform. only a few resources were listed on the first kate a. pittsley (kpittsle@emich.edu) is an assistant professor and business information librarian and sara memmott (smemmott@emich.edu) is an instructor and emerging technologies librarian at eastern michigan university, ypsilanti, michigan. improving independent student navigation of complex educational websites | pittsley and memmott 53 libguides page of the guide used for the course. only one of these resources was a subscription database, and that database was the one that current students were using to the exclusion of many other useful sources. after speaking with the professor, the liaison librarian also worked one-on-one with a student in the course. the student confirmed that she had not noticed the tab navigation and so was unaware of the numerous resources offered on subsequent pages. the professor then sent a message to all students in the course explaining the tab navigation. subsequently the professor reported that students in the course used a much wider range of sources in assignments. statistical evidence of the problem a look at statistics on guide use for fall 2010 showed that on almost all guides the first pages of guides were the most heavily used. as the usual entry point, it wasn’t surprising that the first pages would receive the most use; however, on many multipage guides, the difference in use between the first page and all secondary pages was dramatic. that users missed the tab navigation and so did not realize additional guide pages existed seemed like a possible explanation for this usage pattern. librarians felt strongly that most users should be able to navigate guides without direct instruction in their use, and they were concerned by the evidence that indicated problems with the guide navigation. was there something that could be done to improve independent student navigation in libguides? two types of design changes to navigation were considered. to test the changes, each navigation change was applied to separate sets of guides. usage patterns were then compared for those guides before and after changes were made. the investigators also looked at usage patterns over the same period for a comparison group to which no navigation changes had been made. literature review navigation in libguides and pathfinders the authors reviewed numerous articles related to libguides or pathfinders generally, but found few that mention navigation issues. they then turned to studies of website navigation in general. in an early article on the transition to web-based library guides, cooper noted that “computer screens do not allow viewers to visualize as much information simultaneously as do print guides, and consequently the need for uncomplicated, easily understood design is even greater.”1 four university libraries’ usability studies of the libguides platform specifically address navigation issues. university of michigan librarians dubicki et al. found that “tabs are recognizable and meaningful—users understood the function of the tabs.”2 the michigan study then focused on the use of meaningful language for tab labels. however, at the latrobe university library (australia), corbin and karasmanis found a consistent pattern of students not recognizing the navigation tabs, and so recommended providing additional navigation links elsewhere on the page.3 at the university of washington, hungerford et al. found students did not immediately recognize the tab navigation: information technology and libraries | se ptember 2012 54 during testing it was observed that users frequently did not notice a guide’s tabs right away as a navigational option. users’ eyes were drawn to the top middle of the page first and would focus on content there, especially if there was actionable content, such as links to other pages or resources.4 the solution at the university of washington was to require that all guides have a main page navigation area (libguides “box”) with a menu of links to the tabbed pages. after a usability study, mit libraries also recommended use of a duplicate navigation menu on the first page, stating in mit libraries staff guidelines for creating libguides to “make sure to link to the tabs somewhere on the main page” as “users don’t always see the tabs, so providing alternate navigation helps.”5 navigation palmer mentions navigation as one of the factors most significantly associated with website success as measured by user satisfaction, likelihood to use a site again, and use frequency.6 however, effective navigation may be difficult to achieve. nielsen found in numerous studies that “users look straight at the content and ignore the navigation areas when they scan a new page.”7 in a presentation on the top ten mistakes in web design, human–computer interaction scholar tullis included “awkward or confusing navigation.”8 the following review of the literature on website navigation design is limited to studies of navigation models that use browsing via menus, tabs, and menu bars. the navigation problem seen in libguides is far from unique. usability studies for other information-rich websites demonstrate similar problems with users not recognizing navigation tabs or menu bars similar to those used in libguides. in 2001, mcgillis and toms investigated the usability of a library website with a horizontal navigation bar at the top of the page, a design similar to the single row of libguides tabs. this study found that users either did not see the navigation bar or did not realize it could be clicked.9 in multiple usability studies, u.s. census bureau researchers found similar problems with navigation bars on government websites. in 2009, olmsted-hawala et al. reported that study participants did not use the top-navigation bar on the census bureau’s business and industry website.10 the next year, chen et al. again reported problems with top-navigation bar use on the governments division public website, explaining that the “top-navigation bar blends into the header, leading participants to skip over the tabs and move directly to the main content. this is a recurring issue the usability laboratory has identified with many web sites.”11 one possible explanation for user neglect of tabs and navigation bars may be a phenomenon termed “banner blindness.” as early as 1999, benway provided in-depth analysis of this problem. in his thesis, he uses the word “banner” not just for banner ads, but also for banners that consist of horizontal graphic buttons similar to the libguides tab design. benway’s experiments show that an attempt to make important items visually prominent may have the opposite effect— that “the visual distinctiveness may actually make important items seem unimportant.” benway follows with two recommendations: (1) that “any method that is created to make something stand out should be carefully tested with users who are specifically looking for that content to ensure that it does not cause banner blindness,” and (2) that “any item visually distinguished on a page should be duplicated within a collection of links or other navigation areas of the page. that way, if searchers ignore the large salient item, they can still find what they need through basic navigation.”12 improving independent student navigation of complex educational websites | pittsley and memmott 55 in 2005, tullis cited multiple studies that showed that users found information faster or more effectively by using a simple table of contents than by using other navigation forms, including tabbased navigation.13 yet in 2011, nicolson et al. found that “participants rarely used table of contents; and often appeared not to notice them.”14 yelinek et al. pointed to a practical problem in using content menus on libguides pages: since libguides pages can be copied or mirrored on other guides, guide authors must be cognizant that such menus could cause problems with incorrect or confusing navigational links on copied or mirrored pages.15 success can also depend on the location of navigational elements, although researchers disagree on effects of location. in addition, user expectations of where to look for navigation elements may change over time along with changes in web conventions. in 2001, bernard studied user expectations as to where common web functions would be located on the screen layout. he found that “most participants expected the links to web pages within a website to be almost exclusively located in the upper-left side of a web page, which conforms to the current convention of placing links on [the] left side.”16 in 2004, pratt et al. found that users were equally effective using horizontal or vertical navigation menus, but when given a choice more users chose to use vertical navigation.17 also in 2004, mccarthy et al. performed an eye-tracking study, which showed faster search times when sites conformed to the expected left navigation menu and a user bias toward searching the middle of the screen; but it also found that the initial effect of menu position diminished with repeated use of a site.18 nonetheless, jones found that by 2006 most corporate webpages used “horizontally aligned primary navigation using buttons, tabs, or other formatted text.”19 in 2008, cooke found that users looked equally at left, top, and center menus; however, when “a visually prominent navigation menu populated the center of the web page, participants were more likely to direct their search in this location.”20 wroblewski describes how tab navigation was first popularized by amazon.21 burrell and sodan investigated user preferences for six navigation styles and found that users clearly preferred tab navigation “because it is most easily understood and learned.”22 in the often-cited web design manual don’t make me think, krug also recommends tabs: “tabs are one of the very few cases where using a physical metaphor in a user interface actually works.”23 krug recommends that tabs be carefully designed to resemble file folder tabs. they should “create the visual illusion that the active tab is in front of the other tabs . . . the active tab needs to be a different color or contrasting shade [than the other tabs] and it has to physically connect with the space below it. this is what makes the active tab ‘pop’ to the front.”24 an often-cited u.s. department of health and human services manual on research-based web design addresses principles of good tab design, stating that tabs should be located near the top of the page and should “look like clickable versions of real-world tabs. real-world tabs are those that resemble the ones found in a file drawer.”25 nielsen provides similar guidelines for tab design, which include that the selected tab should be highlighted, the current tab should be connected to the content area (just like a physical tab), and that one should use only one row of tabs.26 more recently, cronin highlighted examples of good tab design that effectively use elements such as rounded tab corners, space between tabs, and an obvious design for the active tab that visually connects the tab to the area beneath it.27 christie also provides best practices for tab design that include consistent use of only one row of tabs, use of a prominent color for the active tab and a single information technology and libraries | se ptember 2012 56 background color for unselected tabs, changing the font color on the active tab, and use of rounded corners to enhance the file-folder-tab metaphor.28 two articles mention that the complexity of a site can be a factor in navigation success. mccarthy et al. found that search times are significantly affected by site complexity and recommended finding ways to balance the provision of numerous user options with simplifying the site so that users can find their way.29 little specifically suggests reducing the amount of extraneous information on libguides pages in her article, which applies cognitive load theory to use of library research guides.30 in sum, effective navigation is difficult to achieve. however, navigation design can be improved by considering the purpose of the site, user expectations, common conventions, best practices, the possibility that intuitive ideas for design may not perform as expected (e.g., banner blindness), the site’s complexity, and more. research question and method could design changes improve independent student use of libguides tab navigation? the literature reviewed above suggested two likely design changes to test: adding additional navigation links in the body of the page and improving the tab design. testing these design changes on selected guides would allow the emu library to assess the impact before implement changes on all library research guides. for this experiment, each type of navigation change was applied to separate subsets of guides; a subset of similar guides was selected as a comparison group; and usage patterns were analyzed for similar periods before and after changes were made. navigation design changes were made to fourteen subject guides related to business. the business subject guides were divided into two experimental groups of seven guides. in group a, a table of contents box with navigation links was added to the front page of each guide, and in group b, the navigation tabs were altered in appearance. no navigation changes were made to comparison group c. class specific guides were excluded from the experiment, as in many cases the business librarian would have instructed students in the use of tabs on class guides. changes were made at the beginning of the winter 2011 semester so that an entire semester’s data could be collected and compared to the previous semester’s usage patterns. the design for group a was similar to the university of washington implementation of a “what’s in the guide” box on guide homepages that repeated the tab navigation links.31 for guides in group a, a table of contents box was placed on the guide homepages. it contained a simple list of links to the secondary pages of the guides, using the same labels as on the navigation tabs. the table of contents box used a larger font size than other body text and was given an outline color that contrasted with the outline color used on other boxes and matched the navigation tab color to create visual cues that this box had a different function from the other boxes on the page (navigation). the table of contents box was placed alongside other content on the guide homepages so users could still see the most relevant resources immediately. figure 1 shows a guide containing a table of contents box. improving independent student navigation of complex educational websites | pittsley and memmott 57 figure 1. group a guide with content menu box labeled “guide sections” the design change for group b focused on the navigation tabs. libguides tabs exhibit some of the properties of good tab design, such as allowing for rounded corners and contrasting colors for the selected tabs. other aspects are not ideal, such as the line that separates the active tab from the page body.32 in the emu library’s initial libguides implementation, the option for tabs with rounded corners was used to resemble the design of manila file folders and increase the association with the file-folder metaphor. possibilities for further design adaptation on the experimental guides were somewhat limited because these changes needed to be applied to the tabs of just a selected set of guides. the investigators theorized that increasing the height of the tabs might make them more closely resemble paper file folder tabs. increasing the height would also increase the area of the tabs, and the larger size might also make the tabs more noticeable. this option was simple to implement on the guides in group b by adding html break tags,
, to the tab text. taller tabs also provided more room for text on the tabs. tabs in libguides will expand in width to fit the text label used, and if the tabs on a guide require more space on the page, they will be displayed in multiple rows. multiple rows of tabs are visually confusing and break the tabs metaphor, decreasing their usefulness for navigation.33 the emu library’s best practices for research guides already encouraged limiting tabs to one row. adding height to tabs allowed for clearer text labels on some guides without expanding the tab display beyond a single row. figure 2 shows a guide containing the altered taller tabs. information technology and libraries | se ptember 2012 58 figure 2. group b guide with tabs redesigned to look more like file folder tabs while variations in content and usage of library guides did not allow for a true control group, other social science subject guides were selected as a comparison group. social science subject guides were excluded from the comparison group if they had very low guide usage during the fall 2010 semester (fewer than thirty uses), or if they had fewer than three tabs, making them structurally dissimilar to the business guides. this left a group of sixteen comparison guides. no changes were made to the navigation design of these guides during the test period. the business guides—which the authors had permission to experiment with—tend to be longer and have more pages than other guides. on average, the experimental guides had more pages per guide than the comparison guides; guides in groups a and b averaged nine pages per guide, and comparison guides averaged five pages per guide. guides with more pages will tend to have a higher percentage of hits on secondary pages because there are more pages available to users. however, the authors intended to measure the change in usage patterns with each guide measured against itself in different periods, and the number of pages in each guide did not change from semester to semester. data collection and results libguides provides monthly usage statistics that include the total hits on each guide and the number of hits on each page of a guide. use of secondary pages of the guides was measured by calculating the proportion of hits to each guide that occurred on secondary pages. data for the fall 2010 semester (september through december 2010) was used to measure usage patterns before navigation changes were made to the experimental guides. data for the winter 2011 semester (january through april 2011) was used to measure usage patterns after navigation changes were made. each would represent a full semester’s use at similar enrollment levels with many of the same courses and assignments. usage patterns for the comparison guides were also examined for these periods. improving independent student navigation of complex educational websites | pittsley and memmott 59 as shown in figures 3 and 4, in both group a and group b, the percentage of hits on secondary pages increased in five guides and decreased in two guides. figure 3. group a: change in secondary page usage with content menus added for winter 2011 figure 4. group b: change in secondary page usage with new tab design for winter 2011 both groups of experimental guides showed an increase in use of secondary guide pages after the design changes were made. the median usage score was calculated for each group. group a, with the added menu links, showed an increase of 10.3 points in the median percentage of guide hits on secondary pages. group b, with redesigned tabs, showed an increase of 10.4 points in the median percentage of guide hits on secondary pages. within the comparison guides, the proportion of hits secondary tab usage : guides in group a fall 2010 winter 2011 secondary tab usage: guides in group b fall 2010 winter 2011 information technology and libraries | se ptember 2012 60 on secondary pages did not change significantly from fall 2010 to winter 2011. table 1 shows the median percentage of guide hits on secondary pages before and after navigation design changes. group a: menu links added group b: tabs redesigned group c: comparison group fall 2010 39.1% 50.5% 37.7% winter 2011 49.4% 60.9% 37.4% table 1. median percentage of guide hits on secondary pages the box plot in figure 5 graphically illustrates the range of the usage of secondary pages in each group of guides and the changes from fall 2010 to winter 2011, showing the minimum, maximum, and median scores, as well as the range of each quartile. figure 5. distribution of percentage of guide hits on secondary pages. this figure demonstrates the change in usage pattern for groups a and b and the lack of change in usage pattern for comparison group c. averages for the percentage change in secondary tab use were also computed for the combined experimental groups and the comparison group. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% group a f10 group a w11 group b f10 group b w11 group c f10 group c w11 improving independent student navigation of complex educational websites | pittsley and memmott 61 experimental or comparison n mean std. deviation std. error mean change in secondary tab use dim ension 1 experimental 14 .07871 .097840 .026149 comparison 16 -. 02550 .145977 .036494 table 2. average change in secondary tab use from fall 2010 to winter 2011, comparing all experimental guides (groups a & b) with all comparison (group c) guides. when comparing all experimental guides and all comparison guides, the change in use of secondary pages was found to be statistically significant. the average change in use of secondary pages for all experimental guides (groups a and b) was .07871, and the average for all comparison guides (group c) was -.02550. a t test showed that this difference was significant at the p < . 05 level (p = .032). study limitations in some (possibly many) cases, the first page of the guide provides all necessary sources and advice for an assignment. we measured actual use of secondary pages, but were unable to measure recognition of navigation elements where the student did not use the secondary pages because they had no need for additional resources. because it wasn’t possible to control use of the guides during the periods studied, it is possible that factors other than the design changes contributed to the pattern of hits. though subject guides rather than class guides were used to limit the influence of instruction in the use of guides, it wasn’t possible to determine with certainty if other faculty members instructed a significant number of students in the use of particular guides during the periods examined. the comparison group was slightly dissimilar in that they had fewer pages than the experimental guides; however, the number of pages on a guide did not correlate with a change in percentage of hits on secondary pages from one semester to the next. application of findings when presented with the study results, the full library faculty at emu expressed interest in using both design changes across all library research guides. the change to tab design—which is easiest to implement—has been made to all subject guides. some librarians also chose to add content menus to selected guides. since the complexity of research guides is also a factor in successful navigation,35 a recent libguides enhancement was used to move elements from the header area to the bottom of the guides. the elements moved out of the header included the date of last update, guide url, print option, and rss updates. the investigators hypothesize that the reduced complexity of the header may help in recognizing the tab navigation. although convinced that the experimental changes made a difference to independent student navigation in research guides, the authors hope to find further ways to strengthen independent navigation. vendor design changes to enhance the tab metaphor, such as creating a more visible connection between the active tab and page, might also improve navigation.36 information technology and libraries | se ptember 2012 62 conclusion designing navigation for complex sites, such as library research guides, is likely to be an ongoing challenge. this study suggests that thoughtful design changes can improve navigation. in this case, both duplicate menu links and improvements to tab design improved independent student navigation of complex research sites. references and notes 1. eric a. cooper, “library guides on the web: traditional tenets and internal issues,” computers in libraries 17, no. 9 (1997): 52. 2. barbara dubicki beaton et al., libguides usability task force guerrilla testing (ann arbor: university of michigan, 2009), http://www.lib.umich.edu/content/libguides-guerillatesting. 3. jenny corbin and sharon karasmanis, health sciences information literacy modules usability testing report (bundoora, australia: la trobe university library, 2009), http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852. 4. rachel hungerford, lauren ray, christine tawatao, and jennifer ward, libguides usability testing: customizing a product to work for your users (seattle: university of washington libraries, 2010), 6, http://hdl.handle.net/1773/17101. 5. mit libraries, research guides (libguides) usability results (cambridge, ma: mit libraries, 2008), http://libstaff.mit.edu/usability/2008/libguides-summary.html; mit libraries, guidelines for staff libguides (cambridge, ma: mit libraries, 2011), http://libguides.mit.edu/staff-guidelines. 6. jonathan w. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151-67, doi:10.1287/isre.13.2.151.88. 7. jakob nielsen, “is navigation useful?,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/20000109.html. 8. thomas s. tullis, “web-based presentation of information: the top ten mistakes and why they are mistakes,” in hci international 2005 conference: 11th international conference on human-computer interaction, 22–27, july 2005, caesars palace, las vegas, nevada usa (mahwah nj: lawrence erlbaum associates, 2005), doi:10.1.1.107.9769. 9. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 355–67, http://crl.acrl.org/content/62/4/355.short. 10. erica olmsted-hawala et al., usability evaluation of the business and industry web site, survey methodology #2009–15, (washington, dc: statistical research division, u.s. census bureau, 2009), http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf. 11. jennifer chen et al., usability evaluation of the governments division public web site, survey http://www.lib.umich.edu/content/libguides-guerilla-testing http://www.lib.umich.edu/content/libguides-guerilla-testing http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852 http://hdl.handle.net/1773/17101 http://crl.acrl.org/content/62/4/355.short http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf improving independent student navigation of complex educational websites | pittsley and memmott 63 methodology #2010–02, (washington, dc: u.s. census bureau, usability laboratory, 2010), 19, http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf. 12. jan panero benway, “banner blindness: what searching users notice and do not notice on the world wide web” (phd diss., rice university, 1999), 75, http://hdl.handle.net/1911/19353. 13. tullis, “web-based presentation of information.” 14. donald j. nicolson et al., “combining concurrent and sequential methods to examine the usability and readability of websites with information about medicines,” journal of mixed methods research 5, no. 1 (2011): 25–51, doi:10.1177/1558689810385694. 15. kathryn yelinek et al., “using libguides for an information literacy tutorial 2.0,” college & research libraries news 71, no. 7 (july): 352–55, http://crln.acrl.org/content/71/7/352.short 16. michael l. bernard, “developing schemas for the location of common web objects,” proceedings of the human factors and ergonomics society annual meeting 45, no. 15 (october 1, 2001): 1162, doi:10.1177/154193120104501502. 17. jean a. pratt, robert j. mills, and yongseog kim, “the effects of navigational orientation and user experience on user task efficiency and frustration levels,” journal of computer information systems 44, no. 4 (2004): 93–100. 18. john d. mccarthy, m. angela sasse, and jens riegelsberger, “could i have the menu please? an eye tracking study of design conventions,” people and computers 17, no. 1 (2004): 401–14. 19. scott l. jones, “evolution of corporate homepages: 1996 to 2006,” journal of business communication 44, no. 3 (2007): 236–57, doi:10.1177/0021943607301348. 20. lynne cooke, “how do users search web home pages?” technical communication 55, no. 2 (2008): 185. 21. luke wroblewski, “the history of amazon’s tab navigation,” lukew ideation + design, may 7, 2007, http://www.lukew.com/ff/entry.asp?178. after addition of numerous product categories made tabs impractical, amazon now relies on a left-side navigation menu. 22. a. burrell and a. c. sodan, “web interface navigation design: which style of navigationlink menus do users prefer?” in 22nd international conference on data engineering workshops, april 2006. proceedings (washington, d.c.: ieee computer society, 2006), 42– 42, doi:10.1109/icdew. 2006.163. 23. steve krug, don’t make me think! a common sense approach to web usability, 2nd ed. (berkeley: new riders, 2006), 79. 24. ibid., 82. http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf http://hdl.handle.net/1911/19353 http://crln.acrl.org/content/71/7/352.short http://www.lukew.com/ff/entry.asp?178 information technology and libraries | se ptember 2012 64 25. u.s. department of health and human services, “navigation,” in research-based web design & usability guidelines (washington, dc: u.s. department of health and human services, 2006), 8, http://www.usability.gov/pdfs/chapter7.pdf. 26. jakob nielsen, “tabs, used right,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/tabs.html. 27. matt cronin, “showcase of well-designed tabbed navigation,” smashing magazine, april 6, 2009, http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designedtabbed-navigation. 28. alex christie, “usability best practice, part 1—tab navigation,” tamar, january 13, 2010, http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation. 29. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 30. jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 52–63, doi:10.1080/10875300903530199. 31. hungerford et al., libguides usability testing. 32. christie, “usability best practice”; nielsen, “tabs, used right”; krug, don’t make me think; cronin, “showcase of well-designed tabbed navigation.” 33. christie, “usability best practice”; nielsen. “tabs, used right.” 34. eva d. vaughan, statistics: tools for understanding data in the behavioral sciences (upper saddle river, nj: prentice hall, 1998), 66. 35. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 36. springshare, the libguides vendor, has been amenable to customer feedback and open to suggestions for platform improvements. http://www.usability.gov/pdfs/chapter7.pdf http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation 60 information technology and libraries | june 2011 b ecause this is a family program and because we are all polite people, i can’t really use the term i want to here. let’s just say that i am an operating system [insert term here for someone who is highly promiscuous]. i simply love to install and play around with various operating systems, primarily free operating systems (oses), primarily linux distributions. and the more exotic, the better, even though i always dutifully return home at the end of the evening to my beautiful and beloved ubuntu. in the past year or two i can recall installing (and in some cases actually using) the following: gentoo, mint, fedora, debian, moonos, knoppix, damn small linux, easypeasy, ubuntu netbook remix, xubuntu, opensuse, netbsd, sabayon, simplymepis, centos, geexbox, and reactos. (aside from stock ubuntu and all things canonical, the one i keep a constant eye on is moonos [http://www.moonos.org/], a stunningly beautiful and eminently usable ubuntu-based remix by a young artist and programmer in cambodia, chanrithy thim.) in the old days i would have rustled up an old, sloughed-off pc to use as an experimental “server” upon which i would unleash each of these oses, one at a time. but those were the old days, and these are the new days. my boss kindly bought me a big honkin’ windows-based workstation about a year and a half ago, a box with plenty of processing power and memory (can you even buy a new workstation these days that’s not incredibly powerful, and incredibly inexpensive?), so my need for hardware above and beyond what i use in my daily life is mitigated. specifically, it’s mitigated through use of virtual machines. i have long used virtualbox (http://www.virtualbox .org/) to create virtual machines (vms), lopped-off hunks of ram and disk space to be used for the installation of a completely different os. with virtualbox, you first describe the specifications of the vm you’d like to create—how much of the host’s ram to provide, how large a virtual hard disk, boot order, access to host cd drives, usb devices, etc. you click a button to create it, then you install an os onto it, the “guest” os, in the usual way. (well, not exactly the usual way; it’s actually easier to install an os here because you can boot directly from a cd image, or iso file, negating the need to mess with anything so distasteful and old-fashioned and outre as an actual, physical cd-rom.) in my experience, you can create a new vm in mere seconds; then it’s all a matter of how difficult the os is to install, and the linux distributions are becoming easier and easier to install as the months plow on. at any rate, as far as your new os is concerned, it is being installed on bare metal. virtual? real? for most intents and purposes the guest os knows no difference. in the titillatingly dangerous and virus-ridden cyberworld in which we live, i’ll not mention the prophylactic uses of vms because, again, this is a family program and we’re all polite people. suffice it to say, the typical network connection of a vm is nated behind the nic of the host machine, so at least as far as active network– based attacks are concerned, your guest vm is at least as secure as its host, even more so because it sits in its own private network space. avoiding software-based viruses and trojans inside your vm? let’s just say that the wisdom passed down the cybergenerations still holds: when it rains, you wear a raincoat—if you see what i’m saying. aside from enabling, even promoting my shameless os promiscuity, how are vms useful in an actual work setting? for one, as a longtime windows guy, if i need to install and test something that is *nix-only, i don’t need a separate box with which to do so. (and vice versa too for all you unix-weaned ladies and gentlemen who find the need to test something on a rocker from redmond.) if there is a software dependency on a particular os, a particular version of a particular os, or even if the configuration of what i’m trying to test is so peculiar i just don’t want to attempt to mix it in with an existing, stable vm, i can easily and painlessly whip up a new instance of the required os and let it fly. and deleting all this when i’m done is easily accomplished within the virtualbox gui. using a virtual machine facilitates the easy exploration of new operating systems and new applications, and moving toward using virtual machines is similar to when i first started using a digital camera. you are free to click click click with no further expense accrued. you don’t like what you’ve done? blow it away and begin anew. all this vm business has spread, at my home institution, from workstation to data center. i now run both a development and test server on vms physically sitting on a massive production server in our data center—the kind of machine that when switched on causes a brown-out in the tri-state area. this is a very efficient way to do things though because when i needed access to my own server, our system administrator merely whipped up a vm for me to use. to me, real or virtual, it was all the same; to the system administrator, it greatly simplified operations. and i may joke about the loud clank of the host server’s power switch and subsequent dimming of the lights, but doing things this way has been shown to be more energy efficient than running a server farm in which each server editorial board thoughts: just like being there, or how i learned to stop coveting bare metal and learned to love my vm mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland. mark cyzyk editorial board thoughts | cyzyk 61 virtual machines: zero-cost playgrounds for the promiscuous, and energy efficient, staff saving tools for system operations. what’s not to like? throw dual monitors into the mix (one for the host os; one for the guest), and it’s just like being there. sucks in enough juice to quench the thirst of its redundant power supplies. (they’re redundant, they repeat themselves; they’re redundant, they repeat themselves—so you don’t want too many of them around slurping up the wattage, slurping up the wattage . . . ) design, development, implementation, and evaluation of a mobile application for academic library services: a study in a developing country article design, development, implementation, and evaluation of a mobile application for academic library services a study in a developing country hamid reza saeidnia, marcin kozak, brady lund, nishith reddy mannuru, hamid keshavarz, bakthavachalam elango, afshin babajani, and ali ghorbi information technology and libraries | september 2023 https://doi.org/10.6017/ital.v42i3.15977 hamid reza saeidnia (hamidrezasaednia@gmail.com) is msc, tarbiat modares university. marcin kozak (nyggus@gmail.com) is full professor, university of information technology and management in rzeszow. brady lund (brady.lund@unt.edu) is assistant professor, university of north texas. nishith reddy mannuru (nrmannuru@unt.edu) is doctoral student, university of north texas. hamid keshavarz (ha_keshavarz@sbu.ac.ir) is assistant professor, shahid beheshti university. bakthavachalam elango (belango@yahoo.com) is assistant professor, rajagiri college of social sciences. afshin babajani (afshinbabajani69@gmail.com) is msc, tarbiat modares university. ali ghorbi (alighorbi733@gmail.com) is msc, university of tehran. © 2023. abstract universities and scientific educational institutions today need targeted information services to ensure that their user communities have the information they need. this study aims to design, develop, implement, and evaluate a mobile application for academic library services at tarbiat modares university (tehran, iran). a four-stage process was utilized to accomplish this aim. in the first phase, relevant literature was reviewed to obtain appropriate data requirements for the app. a questionnaire was designed and administered to survey expert librarians on the most suitable data requirements. the second phase involved the design of the user interface and user experience with the assistance of experts, followed by the evaluation of the experience. the third phase involved the development of the app in the android studio environment using the java programming language, based on the requirements identified in the first and second phases. the app was then made available to the user community. finally, the app was evaluated in the fourth phase using a questionnaire tool. the researchers found this approach to application development to be both economical and effective in the context of a developing country. introduction academic libraries are recognized as essential institutions for providing scientific resources and facilitating access to the user community.1 considering the role of university libraries in providing guidance and quality information to satisfy users’ information needs, these libraries are considered one of the most important scientific, educational, and research information bases on any campus.2 university libraries support a wide variety of academic programs as part of their mission. therefore, in every university, the proper functioning of the library is crucial, and any type of tool that facilitates this function and adds to its richness is also important.3 the way these libraries provide services is also important. library services are provided to fulfill a library’s organizational mission and obligations to patrons.4 mobile services in the library are one of the most flexible types of library services.5 these services are not limited to one physical place or specific population and can meet the needs of a fluctuating service population.6 the goals and objectives of mobile library services are an integral part of the overall mission of the library mailto:hamidrezasaednia@gmail.com mailto:nyggus@gmail.com mailto:brady.lund@unt.edu mailto:nrmannuru@unt.edu mailto:ha_keshavarz@sbu.ac.ir mailto:belango@yahoo.com mailto:afshinbabajani69@gmail.com mailto:alighorbi733@gmail.com information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 2 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi system.7 in this study, the authors designed, developed, and evaluated a mobile app for academic library services at tarbiat modares university. the app was designed to provide a range of academic library services to users, such as access to resources and information, as well as tools for searching and navigating the library’s collections. literature review past research has shown the effectiveness of different forms and methods of improving information access and dissemination in various organizations.8 our research aims to improve student access to information at tarbiat modares university.9 the goal of this research is to design and develop a mobile app to address the lack of a comprehensive information access system in the central library at tarbiat modares university. the field of research on information access and dissemination has a long history with many notable studies. the o’neill thesis highlighted the importance of establishing a relationship between the library and researchers’ information needs through customized profiles that include keywords and selected themes.10 horne and kristensen described a campuswide table of contents service at cornell university library, which aimed to improve the performance of the service for scholars.11 hossain and islam discussed traditional and electronic methods of information dissemination, emphasizing the importance of such services.12 de giusti et al. developed an ontology-based system to identify users’ information context, which allows librarians to develop users’ profiles beyond the information the users offer themselves.13 porcel et al. suggested a fuzzy logic method for the dissemination of research resources, which improves users’ access to the information they need and recommends specialized and supplementary research resources to users.14 this study intends to develop a mobile app specifically for students of tarbiat modares university to improve their access to information. we believe that students with access to this app, when it is developed, will access information better and faster. information access has been identified as a major challenge for developing countries with limited access to personal computers, weak technological infrastructure, and barriers to accessing information due to restrictive policies.15 this is particularly prevalent in some developing asian countries.16 iran, despite having a strong communication infrastructure, also faces challenges with access to information.17 the use of the internal internet (iran’s internet is heavily censored and monitored) is widespread, while access to the global internet is limited, resulting in shortcomings in accessing global and high-quality information.18 as a result, many organizations and government centers in iran rely on the internal internet for disseminating information.19 recent literature suggests that the status of information access in public and university libraries in iran is low, with traditional methods of information dissemination (i.e., card catalogs) still in use.20 however, studies have shown that university libraries in developing countries can play a significant role in human development by providing a noncommercial mechanism for distributing information on important societal topics such as health, agriculture, nutrition, and women ’s health.21 to achieve this, it is necessary to move beyond traditional university library services and adopt new technologies, such as smartphone apps.22 in developing countries, it is particularly important to strengthen people’s ability to access reliable and accurate information.23 methodology in this study, a thorough review of scholarly literature was conducted to identify the necessary requirements for developing a mobile app. a graphical abstract, represented in figure 1, outlines the project description. expert librarians were responsible for selecting the necessary information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 3 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi requirements from the chosen features for the application. the design of the mobile app prototype was carried out with the assistance of it experts. the usability of the app was evaluated using two standard questionnaires. the following sections provide an in-depth examination of each phase of the app’s development. figure 1. graphical abstract of the research phases. first, we conducted a review of the literature to identify the data requirements for an app. with the data obtained, we designed a researcher-made questionnaire and used that to identify the most appropriate features and requirements. expert university librarians provided support during this phase. our second phase involved designing the user interface and user experience of the mobile app with the help of it experts. we used the heuristic method as well as jakob nielsen’s 10 general principles to evaluate the prototype. in the next phase, we developed the mobile app using the data obtained from the first and second phases. we released the mobile app to students after it had been developed. using sus and quis tools, we evaluate the usability of the mobile app in the final phase. first phase: identifying the data requirements the first phase of the study involved identifying the key features of a mobile app for information access and library finding. to do this, a review approach was used, partially following the prismascr guidelines.24 a search of various databases was conducted using keywords and keyphrases such as “academic libraries,” “mobile service,” “smartlibrary,” “library services,” “library,” “application,” “app,” “smartphone,” “mobile-based application,” “mobile-based app,” “mobile library service,” “mobile technology,” “mobile devices,” “mobile library,” and “m-library app.” inclusion criteria for the publications reviewed required that articles or reviews be original, available in full text, written in english, and about the subject of mobile-based information access and finding for academic libraries. based on the results of the review, a questionnaire was designed to validate the selected application options and features (see appendix a).25 the reliability and validity of the questionnaire were calculated and confirmed by experts in the field of librarianship and information science. the questionnaire was distributed to 16 participants via email and google forms, with the options scored on a scale of 1 to 5. ibm spss statistics software, version 19.0 was used to complete all statistical analyses. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 4 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi second phase: designing and evaluating the user experience in the design phase, the focus was on enhancing the user experience and feature list of the mobile app through online workshops with experts in information technology, informatics, user experience design, and android mobile app development. participants were invited to participate via social networks (e.g., instagram, facebook, and linkedin) from october 20–28, 2021, with the goal of enrolling as many experts as possible. ultimately, 14 experts participated in the design phase. to better understand the user experience flow, we used the balsamiq mockups tool (https://balsamiq.com/), a low-fidelity wireframing tool that is industry standard.26 the figma web application (https://figma.com/) was also used for prototyping the mobile app, a collaborative interface design tool that emphasizes user experience and interface design .27 once an alpha version of the prototype was accepted by the experts, it was evaluated using the heuristic method to identify areas for improvement before the development phase. five experts in user experience design and mobile app development were asked to evaluate the prototype using jakob nielsen’s 10 general principles of nielsen’s severity ranking scale, based on a five-point scale (see appendix b).28 third phase: developing and implementing the results obtained from the first and second phases of planning provided the knowledge to be used in the development phase, in which we developed a database using my structured query language (mysql) and a mobile app to assist with information access and finding using android studio. during this phase, students were recruited for our study through purposive sampling, and interested candidates contacted us through online registration after seeing th e recruitment posts on tarbiat modares university’s social networks. twenty-seven participants registered and used the app from november 10 to december 25, 2021, for at least 20 minutes per day. the app contained an initial guide and an about us section with a whatsapp group created for participants to ask the app’s creators any questions related to the study and the app. the app was updatable to resolve any technical issues. fourth phase: evaluating usability after the implementation phase, which ended on december 26, 2021, the usability of the information access and library finding app was evaluated using two standard questionnaires: the system usability scale (sus) and the questionnaire for user interaction satisfaction (quis).29 the quis, which contains six sections, was used to evaluate user satisfaction with specific aspects of the human-computer interface, while the sus, which includes 10 questions rated on a five-point likert scale, was used to obtain an overall impression of users’ subjective evaluations of the app (see appendix c). the data was analyzed using spss software version 22. results general characteristics this study identified the features of a mobile-based app for library finding and information access for an academic library in iran. the objective was achieved by retrieving 1,074 documents, eliminating duplicate records (n=193), and screening 881 records. using the inclusion and exclusion criteria, 12 articles were selected for further analysis (see appendix d for the twelve articles and the features chosen from each). figure 2 depicts how articles were chosen. https://balsamiq.com/ https://figma.com/ information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 5 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi figure 2. the procedure for inclusion and exclusion of publications. sort and rate features a set of 65 features of a mobile-based sdi app for the academic library were identified and were divided into six categories: access (6 items), search (8 items), recommend (8 items), notification (10 items), sending (6 items), and other (27 items). academic librarians and information science experts scored the identified features (table 1). table 1. the mean of the selected features # access feature mean 1 access 24/7 4.71 + 2 access to books to read and information on many topics 4.41 + 3 access to appropriate educational websites 4.31 + information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 6 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi # access feature mean 4 access to the other library website 3.92 + 5 access to collaboration map and contact information 3.43 + 6 access to the library website which contains book reviews by students 3.25 + # search feature mean 1 catalog searches 4.33 + 2 metasearch 4.12 + 3 searching databases 4 + 4 search by qr code scan 3.78 + 5 search by barcode scan 3.74 + 6 search bibliographic 3.42 + 7 journal search 3.12 + 8 opac 3 + # sending feature mean 1 sending files (text and image) 4.65 + 2 hyperlinks 4.32 + 3 e-resources 4.12 + 4 video 3.45 + 5 voice 2.75 + 6 podcasts 1.85 # recommend feature mean 1 electronic journals 4.91 + 2 learning resources 4.89 + 3 course reserves 4.56 + 4 recommended books and information on academic library 4.32 + 5 events 3.94 + 6 tutorials 3.84 + 7 recommend new books 3.14 + 8 news 2.32 # notification feature mean 1 due date reminder 4.47 + 2 renewal request 4.42 + 3 new arrival notification 4.38 + 4 information about seminars and workshops 3.8 + 5 overdue notification 3.5 + 6 rss 3.2 + 7 change of library hours/holiday 3.1 + 8 news and events 2.7 + 9 new information on products and services 2.3 10 text alerts 1.2 # other feature mean 1 optical character recognition (ocr) 4.81 + 2 research guides 4.72 + 3 citation management 4.68 + 4 ask a librarian reference services 4.63 + information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 7 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi # access feature mean 5 borrowing record 4.52 + 6 library rules 4.41 + 7 favorites 4.38 + 8 view your record 4.23 + 9 floor information 4.12 + 10 ask a librarian 4.4 + 11 library locator 4.1 + 12 question board 4 + 13 reference collection 4 + 14 created group 3.91 + 15 library maps 3.82 + 16 audiobooks 3.71 + 17 reserve 3.42 + 18 return 3 + 19 feedback 2.9 + 20 full-text article finder 2.7 + 21 book circulation 2.5 + 22 virtual tours 2.4 23 faq 2.3 24 library history 2 25 audio tours 1.9 26 library hours 1.8 27 contact us 1.3 + = selected items; = removed items designing the user experience the mobile application’s user experience flow includes features for easy ordering and tracking of orders for resources from other libraries (e.g., books, articles, etc.), as well as access to library news and recommendations. it starts with a registration and login process, where users must provide an email address, username, and password to create an account. once logged in, the user is taken to a menu page with options for ordering and tracking, search, recommendations, and library news. the user can then proceed to the order completion page and fill out the necessary fields to submit the order or access the order tracking page. the tracking page includes a progress bar that displays the status of the user’s order and an online chat section where the user can communicate with a librarian. the menu page also includes a search option and a suggestions feature to help users find information quickly. there are also two action bars on the menu page, one at the top and one at the bottom of the screen, which provide access to additional features such as messages, bookmarks, help, last read, notifications, and settings (see fig. 3). information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 8 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi figure 3. user experience flow of the prototype. user experience evaluation (heuristic method) during the heuristic evaluation of the app’s prototype, these features each earned zero to two points, meaning that the prototype received a rejection or an acceptable score. table 2 presents the results: σa represents the total number of points for each item from jakob nielsen’s 10 general principles; σb is the same as σa after eliminating duplicated points, which finally leads to a total σa = 45 and σb = 23.30 information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 9 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi table 2. sdi mobile app prototype scoring according to jakob nielsen’s 10 general principles of heuristic evaluation: (1) a: number of point scale; (2) b: number of point scale after removing duplicates expert identification number heuristic principle σa(1) σb(2) #1 #2 #3 #4 #5 visibility of system status 4 3 0 1 2 1 0 match between system and the real world 2 2 0 0 0 0 2 user control and freedom 4 3 1 1 0 2 0 consistency and standards 2 1 0 1 0 0 1 error prevention 6 3 2 2 1 1 0 recognition rather than recall 3 1 0 0 1 1 1 flexibility and efficiency of use 9 3 2 2 2 2 1 aesthetic and minimalist design 6 3 1 0 1 2 2 error identification, diagnosis, and recovery 3 1 0 1 0 1 1 help and documentation 6 3 2 2 1 1 0 total 45 23 8 10 8 11 8 developing the mobile app in the development phase, functional requirements, user interface screens, and software database designs were created. the interface design is a process that involves installing specialized software on smartphones and connecting them to the internet so that the users can register their accounts in the app. thus, the users were registered on the server and their registration ids were stored on the app server. mysql was used to design and develop the database, and android studio was used to develop the app. eventually, after a thorough consultation with librarian specialists and experienced it, a beta version of the sdi mobile app was developed (see fig. 4 and appendix e). figure 4. examples of pages from the mobile app sdi. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 10 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi usability evaluation of the sdi mobile app high scores indicate higher satisfaction with the product. in the sdi mobile app, the average quis rating was 6.86, and a score of less than five indicates an unacceptable level of satisfaction. of the 27 sections, 9 (33.3%) were higher than seven. nearly all response items scored higher than five (see table 3). table 3. user satisfaction ratings for sdi mobile app category section range mean 1. overall reactions to the software overall reactions: terrible-wonderful 7–9 7.13 overall reactions: difficult-easy 7–9 7.11 overall reactions: frustrating-satisfying 5–9 6.85 overall reactions: inadequate poweradequate power 5–9 6.82 overall reactions: dull-stimulating 5–9 7.00 overall reactions: rigid-flexible 5–9 6.83 2. screen factors reading characters on the computer screen 7–9 7.12 highlighting on the screen simplifies the task 7–9 7.10 organization of information on the screen 7–9 7.04 sequence of screens 7–9 7.01 3. terminology and system information use of terms throughout the system 5–9 5.89 computer terminology is related to the task 4–9 5.85 position of messages on screen 4–9 5.55 messages on the screen that prompt the user to input 5–9 6.40 the computer keeps you informed about what it is doing 4–9 5.90 error messages 5–9 6.20 4. learning learning to operate the system 4–9 6.90 exploring new features by trial and error 5–9 6.65 remembering names and using commands 5–9 6.75 tasks can be performed in a straightforward manner 5–9 6.19 help messages on the screen 4–9 5.93 supplemental reference materials 5–9 6.15 5. system capability system speed 7–9 7.12 system reliability 7–9 7.12 designed for all levels of users 7–9 7.13 correcting mistakes 5–9 6.81 system sounds tend to be 5–9 6.90 based on the results obtained from the users through the sus questionnaire, the overall usability score of the mobile app for information access and library finding was 89 out of 100. this score is higher than 83 (> 80.3), so based on the sus formula, we can describe the usability of the mobile app as: average sus score = 89 = excellent = grade a. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 11 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi discussion the results of the first phase of our study indicate that 55 out of 65 identified items received a mean of 2.5 or higher from the statistical population’s perspective. in the “access feature” category, items such as 24/7 access (4.71 mean) and access to books/information (4.41 mean) were highly rated. the “search feature” category saw high usage of the catalog search (4.33 mean) and metasearch (4.12 mean) features. a well-designed application allows for easy catalog searches, as evidenced by a survey of university students who had a positive understanding of using mobile-based apps to access the library catalog.31 the catalog search feature was identified as one of the most important features of library applications.32 in the “sending feature” category, the highest means were received by the ability to send files (text and image) and hyperlinks (4.65 and 4.32, respectively). these features are considered practical and important for academic use, as discussed in previous studies on the use of wechat and other social media in academic libraries .33 academic library applications often feature a recommender system, which utilizes user behavior to suggest appropriate items such as journals, books, and tutorials.34 studies on designing and evaluating academic library applications have also incorporated features such as new book, ejournal, and event recommendations.35 the “recommend feature” category saw the highest mean for e-journals (4.91). other methods, such as sms, have been used by academic libraries to send messages such as news announcements and event reminders, but in-app notification systems are considered more efficient and cost-effective.36 our study also found that participants gave the highest ratings to item due date reminders and renewal requests in the “notification feature” category (4.47 and 4.42 mean, respectively).37 finally, the “other features” category saw high ratings for the ability to use optical character recognition (ocr) to automatically detect text in document images and convert them into searchable and editable text (4.81 mean).38 this capability can assist students in discovering library resources through scanning, as shown in a study aimed at designing an ocr-capable app for students.39 based on the findings from the second phase, the experts determined that the design of the use r flow for the library mobile app should prioritize simplicity and ease of use. the online workshop discussed the technical and data requirements for the app and used the balsamiq mockups software to create a user experience flow. these findings are crucial for the prototype design and user interface of the app. additionally, the study addressed the importance of considering user experience and social justice in the design of software services, which has been recognized as a human rights issue by some scholars.40 users generally had a positive opinion of the library mobile app, finding it easy to use and well organized. the app received high scores in screen factors and display organization with a mean of six or above. the “terminology and system information” section of the app also received positive evaluations with a median higher than five in its options range of 4–9. additionally, features such as the ability to adapt to changing conditions, remember and take shortcuts, and provide helpful messages were positively evaluated, with means above six in the range of 5–9. error prevention and rectification were also considered important and positively evaluated. this study has some limitations, such as limited database access for the developers and the app being available only for android users, which excluded ios users. additionally, while the search for data elements and requirements in the first phase was broad, it was not as comprehensive as desired, which may have limited the scope of the study. furthermore, the sample size of the statistical population may not be representative of the entire population of university students information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 12 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi and their perceptions of library mobile apps. despite these limitations, the study presents a comprehensive and realistic approach to the development of a mobile app for university libraries. this could be useful for app developers and managers of university libraries in creating a mobile app for their own libraries. conclusion this study aimed to design, develop, implement, and evaluate a mobile application for academic library services at tarbiat modares university in tehran, iran. through a methodological design, the research team reviewed relevant literature, surveyed expert librarians on data requirements, designed the user interface and user experience, developed the app using the android studio environment and java programming language, made it available to the user community, and finally evaluated it using a questionnaire tool. the results showed that the app was able to effectively provide a range of academic library services to users, such as access to resources and information, as well as tools for searching and navigating the library’s collections. furthermore, the study highlights the importance of involving experts in the design and evaluation process of a mobile app, as well as the use of software related to user interface design and user experience to ensure a more professional design. this app will be a unique offering in tarbiat modares university’s information environment, which will be beneficial for student and faculty’s academic and information needs. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 13 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi appendix a. questionnaire in your opinion, to what extent are the following features effective for use in a mobile-based selective dissemination of information (sdi) app in academic libraries? 1 2 3 4 5 access feature access 24/7 access to books to read and information on many topics access to appropriate educational websites access to the other library website access to collaboration map and contact information. access to the library website which contains book reviews by students search feature catalog searches metasearch searching databases search by qr code scan search by barcode scan search bibliographic journal search opac sending feature sending files (text and image) hyperlinks e-resources video voice podcasts recommend feature electronic journals learning resources course reserves recommended books and information on academic library events tutorials recommend new books news notification feature due date reminder renewal request new arrival notification news information about seminars and workshops overdue notification rss change of library hours/holiday news and events new information on products and services text alerts other feature optical character recognition (ocr) research guides citation management ask a librarian reference services borrowing record library rules information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 14 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi in your opinion, to what extent are the following features effective for use in a mobile-based selective dissemination of information (sdi) app in academic libraries? 1 2 3 4 5 favorites view your record floor information ask a librarian library locator question board reference collection created group library maps audiobooks reserve return feedback full-text article finder book circulation virtual tours faq library history audio tours library hours contact us information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 15 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi appendix b. nielsen’s severity ranking scale, based on a five-point scale 0 = i do not agree that this is a usability issue. 1 = not to be fixed unless extra time is available for the project. 2 = this should be fixed at the lowest priority. 3 = must be fixed, so it should be given high priority. 4 = this must be fixed before the product can be released. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 16 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi appendix c. the system usability scale the system usability scale (sus) has a proprietary formula that determined the app’s usability in the form of grade and adjective ratings.41 sus score grade adjective rating > 80.3 a excellent 68 – 80.3 b good 68 c okay 51 – 68 d poor < 51 f awful information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 17 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi appendix d. identified features based on the literature review # first author year features 1 chen42 2019 search catalogs, databases, and other resources of the library, metasearch 2 mansouri43 2019 search by barcode scan, search by qr code scan, renew and circulation audio tours, virtual tours, audiobooks, library maps, events, contact us, faq, feedback 3 dar44 2019 contact/feedback from users, ask a librarian reference services, opac, downloads (podcasts, videos, mobile apps, etc.), circulation, book circulation, return, reserve, ill, hyperlinks to e-resources, notification (due date reminder, renewal request, new arrival notification, information about seminars and workshops, overdue notification, change of library hours/holiday, news and events, new information on products and services) 4 boller45 2017 access 24/7, access to books to read and information on many topics links to appropriate educational websites, a link to the other library website, a link to the library website which contains book reviews by students, recommended books, and information on academic library events such as author visits 5 hahn46 2017 favorites, journal search, library hours, course reserves, question board 6 pun47 2015 send files, support video-chat, and group and voice messaging 7 canuel48 2015 search bibliographic, citation management 8 pu49 2015 news, recommend new books, floor information, new book notice, library rules, view your record, library history 9 hahn50 2014 optical character recognition (ocr) 10 pianos51 2012 collaboration map and contact information catalog, borrowing record, reference collection, electronic journals, databases, books, text alerts 11 johnstone52 2011 learning resources, ask a librarian, library locator, rss, research guides, tutorials 12 canuel53 2011 full-text article finder information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 18 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi appendix e: pages from mobile app sdi figure 5. registration pages mobile app sdi. figure 6. pages registration, my order, and notification of mobile app sdi. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 19 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi figure 7. pages notification and options of mobile app sdi. figure 8. pages options and search articles of mobile app sdi. information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 20 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi endnotes 1 jeffrey t. gayton, “academic libraries: ‘social’ or ‘communal’? the nature and future of academic libraries,” journal of academic librarianship 34, no. 1 (2008): 60–66. 2 megan oakleaf, “the value of academic libraries: a comprehensive research review and report” chicago: association of college and research libraries, (2010): 26–30. https://alair.ala.org/bitstream/handle/11213/17187/val_report.pdf?sequence=1. 3 charles t. townley, “knowledge management and academic libraries,” college & research libraries 62, no. 1 (2001): 44–55. 4 toby burrows, review of the value of academic libraries: a comprehensive research review and report, by megan oakleaf, the australian library journal 60, no. 1 (2011): 96, https://doi.org/10.1080/00049670.2011.10722580. 5 catherine bomhold, “mobile services at academic libraries: meeting the users’ needs?,” library hi tech 32, no. 2 (2014): 336-45, https://doi.org/10.1108/lht-10-2013-0138. 6 megan oakleaf, "what’s the value of an academic library? the development of the acrl value of academic libraries comprehensive research review and report," australian academic & research libraries 42, no. 1 (2011): 1–13, https://doi.org/10.1080/00048623.2011.10722200. 7 joan k. lippincott, “a mobile future for academic libraries,” reference services review 38, no. 2 (2010): 205–13; robin canuel and chad crichton, “canadian academic libraries and the mobile web,” new library world 112, no. 3/4 (2011): 107–20. 8 judith holt connor, “selective dissemination of information: a review of the literature and the issues,” the library quarterly 37, no. 4 (1967): 373–91. 9 “universities, tarbiat modares university,” shanghai-ranking, 2023, https://www.shanghairanking.com/institution/tarbiat-modares-university. 10 edward k. o’neil, “selective dissemination of information in the dynamic web environment (master's thesis, university of virginia, 2001), 131, http://www.cs.virginia.edu/~cyberia/papers/eko_thesis.pdf. 11 angela k. horne and terry l. kristensen, “the development of mycontents, an enriched electronic tables of contents service,” portal: libraries and the academy 4, no. 2 (2004): 205– 18. 12 m. jaber hossain and md shiful islam, “selective dissemination of information (sdi) service: a conceptual paradigm,” international journal of information science and management (ijism) 6, no. 1 (2012): 27–44. 13 marisa r. de giusti, gonzalo l. villarreal, agustín vosou, and juan p. martínez, “an ontologybased context aware system for selective dissemination of information in a digital library,” arxiv preprint, arxiv:1005.4008 (2010). https://www.shanghairanking.com/institution/tarbiat-modares-university http://www.cs.virginia.edu/~cyberia/papers/eko_thesis.pdf information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 21 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi 14 carlos porcel, a. tejeda-lorente, m. a. martínez, and enrique herrera-viedma, “a hybrid recommender system for the selective dissemination of research resources in a technology transfer office,” information sciences 184, no. 1 (2012): 1–19. 15 i. h. witten, m. loots, m. f. trujillo, and d. bainbridge, “the promise of digital libraries in developing countries,” communications of the acm 44, no. 5 (2001): 82–85. 16 witten, loots, trujillo, and bainbridge, “the promise of digital libraries”; i. h. witten, “digital libraries for the developing world,” interactions 13, no. 4 (2006): 20–21. 17 n. f. haghighi, h. hajihoseini, g. r. nargesi, and m. bijani, “gap analysis of current and desired states of entrepreneurship development components in the field of icts in iran,” technology in society 54 (2018): 101–10; b. k. moghaddam and a. khatoon-abadi, “factors affecting ict adoption among rural users: a case study of ict center in iran,” telecommunications policy 37, no. 11 (2013): 1083–94. 18 c. anderson, “the hidden internet of iran: private address allocations on a national network,” arxiv preprint, arxiv:12096398, 2012. 19 v. wulf, d. randall, k. aal, and m. rohde, “the personal is the political: internet filtering and counter appropriation in the islamic republic of iran,” computer supported cooperative work (cscw) 31, no. 2 (2022): 373–409, https://doi.org/10.1007/s10606-022-09426-7. 20 h. r. saeidnia, a. ghorbi, m. kozak, and s. abdoli, “web-based application programming interface (web apis): vacancies in iranian public library websites,” webology 19, no. 1 (2022): 133–41, https://doi.org/10.14704/web/v19i1/web19010; h. r. saeidnia and m. hasanzadeh, “designing an operational interactive model for the selective dissemination of information (sdi) in academic libraries (case study: tarbiat modares university),” knowledge retrieval and semantic systems 9, no. 32 (2022): 1–34, https://doi.org/10.22054/jks.2019.44197.1233. 21 wai-man wong, “dissemination of information in developing countries,” journal of library & information services in distance learning 2, no. 4 (2006): 19–27, https://doi.org/10.1300/j192v02n04_02; athena michael, “libraries and sustainability in developing countries: leadership models based on three successful organizations,” collaborative librarianship 2, no. 2 (2010), https://digitalcommons.du.edu/collaborativelibrarianship/vol2/iss2/4. 22 saeidnia and hasanzadeh, “designing an operational interactive model,” 1–34. 23 wong, “dissemination of information.” 24 christine e. king and majid sarrafzadeh, “a survey of smartwatches in remote health monitoring,” journal of healthcare informatics research 2, no. 1 (2018): 1–24, https://doi.org/10.1007/s41666-017-0012-7. 25 hamidreza saeidnia, marcin kozak, and sara saeidnia, “indirect website evaluation: currently available tools,” in 8th international conference on web research (icwr) (2022), 143–46, https://doi.org/10.1109/icwr54782.2022.9786252. https://doi.org/10.1007/s10606-022-09426-7 https://doi.org/10.14704/web/v19i1/web19010 https://doi.org/10.22054/jks.2019.44197.1233 https://doi.org/10.1300/j192v02n04_02 https://digitalcommons.du.edu/collaborativelibrarianship/vol2/iss2/4 https://doi.org/10.1007/s41666-017-0012-7 https://doi.org/10.1109/icwr54782.2022.9786252 information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 22 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi 26 balsamiq wireframes, [available from: https://balsamiq.com/.] (2022). 27 robbie gonzalez, “figma wants designers to collaborate google-docs style,” wired 26 (2017). 28 jakob nielsen and rolf molich, “heuristic evaluation of user interfaces,” in proceedings of the sigchi conference on human factors in computing systems (1990), 249–56; jakob nielsen, “reliability of severity estimates for usability problems found by heuristic evaluation,” in posters and short talks of the 1992 sigchi conference on human factors in computing systems (1992), 129–30. 29 john p. chin, virginia a. diehl, and kent l. norman, “development of an instrument measuring user satisfaction of the human-computer interface,” in proceedings of the sigchi conference on human factors in computing systems (1988), 213–18; s. camille peres, tri pham, and ronald phillips, “validation of the system usability scale (sus): sus in the wild,” in proceedings of the human factors and ergonomics society annual meeting 57, no. 1 (los angeles, ca: sage, 2013), 192–96. 30 nielsen and molich, “heuristic evaluation of user interfaces”; nielsen, "reliability of severity estimates.” 31 amera h. abdulrazzaq and mayyadah al-ani, “the awareness and use of smartphone applications to the available services of the university of bahrain library: a proposed application,” journal of wei business and economics 7, no. 1 (2018): 22–32. 32 shih-chuan chen, “undergraduate students’ use of mobile apps to search library catalogs,” library hi tech 37, no. 4 (2019): 721–34. 33 raymond pun, “wechat in the library: promoting a new virtual reference service using a mobile app,” library hi tech news 6 (2015): 9–11; chun mao, “friends and relaxation: key factors of undergraduate students’ wechat using,” creative education (2014), http://dx.doi.org/10.4236/ce.2014.58075. 34 zeno gantner, steffen rendle, christoph freudenthaler, and lars schmidt-thieme, “mymedialite: a free recommender system library,” in proceedings of the fifth acm conference on recommender systems (2011), 305–8. 35 susan boller, “‘appily ever after’: how to create your own library mobile app through easy-touse, low-cost technology,” library hi tech news 34 no. 10 (2017): 7–10, https://doi.org/10.1108/lhtn-09-2017-0069. 36 beth stahr, “sms library reference service options,” library hi tech news 26, no. 3–4 (2009): 13–15, https://doi.org/10.1108/07419050910979955. 37 chun yi wang, hao ren ke, and wen chen lu, “design and performance evaluation of mobile web services in libraries: a case study of the oriental institute of technology library,” electronic library 30, no. 1 (2012): 33–50, https://doi.org/10.1108/02640471211204051. 38 wang, ke, and lu, “design and performance evaluation.” http://dx.doi.org/10.4236/ce.2014.58075 https://doi.org/10.1108/lhtn-09-2017-0069 https://doi.org/10.1108/07419050910979955 https://doi.org/10.1108/02640471211204051 information technology and libraries september 2023 design, development, implementation, and evaluation of a mobile application 23 saeidnia, kozak, lund, mannuru, keshavarz, elango, rabajani, and ghorbi 39 jim hahn, “undergraduate research support with optical character recognition apps,” reference services review 42, no. 2 (2014): 336–50, https://doi.org/10.1108/rsr-09-20130045. 40 sumana harihareswara, “user experience is a social justice issue," code4lib journal 28 (2015), https://journal.code4lib.org/articles/10482; douglas m. walls, “user experience in social justice contexts,” in proceedings of the 34th acm international conference on the design of communication (2016), 1–6. 41 j. r. lewis, “the system usability scale: past, present, and future,” international journal of human–computer interaction, 34, no. 7 (2018): 577–90. 42 chen, “undergraduate students’ use of mobile apps.” 43 a. mansouri and n. s. asl, “assessing mobile application components in providing library services,” the electronic library 37, no. 1 (2019): 49–66, https://doi.org/10.1108/el-102018-0204. 44 s. a. dar, “mobile library initiatives: a new way to revitalize the academic library settings,” library hi tech news 36, no. 5 (2019): 15–21, https://doi.org/10.1108/lhtn-05-2019-0032. 45 boller, “‘appily ever after.” 46 j. hahn, “use and users of the minrva mobile app,” reference services review 45, no. 3 (2017): 472–84. 47 r. pun, “wechat in the library: promoting a new virtual reference service using a mobile app,” library hi tech news 32, no. 6 (2015): 9–11. 48 r. canuel and c. crichton, “leveraging apps for research and learning: a survey of canadian academic libraries,” library hi tech 33, no. 1 (2015): 2–14, https://doi.org/10.1108/lht-122014-0115. 49 y.-h. pu et al., “the design and implementation of a mobile library app system,” library hi tech 33, no. 1 (2015): 15–31, https://doi.org/10.1108/lht-10-2014-0100. 50 hahn, “use and users of the minrva mobile app.” 51 t. pianos, “econbiz to go: mobile search options for business and economics—developing a library app for researchers,” library hi tech 30, no. 3 (2012): 436–48, https://doi.org/10.1108/07378831211266582. 52 brian t. johnstone, “boopsie and librarians: connecting mobile learners and the library,” library hi tech news 28, no 4. (2011): 18–21, https://doi.org/10.1108/07419051111154776. 53 canuel and crichton, “leveraging apps for research.” https://doi.org/10.1108/rsr-09-2013-0045 https://doi.org/10.1108/rsr-09-2013-0045 https://journal.code4lib.org/articles/10482 https://doi.org/10.1108/el-10-2018-0204 https://doi.org/10.1108/el-10-2018-0204 https://doi.org/10.1108/lhtn-05-2019-0032 https://doi.org/10.1108/lht-12-2014-0115 https://doi.org/10.1108/lht-12-2014-0115 https://doi.org/10.1108/lht-10-2014-0100 https://doi.org/10.1108/07378831211266582 https://doi.org/10.1108/07419051111154776 abstract introduction literature review methodology first phase: identifying the data requirements second phase: designing and evaluating the user experience third phase: developing and implementing fourth phase: evaluating usability results general characteristics sort and rate features designing the user experience user experience evaluation (heuristic method) developing the mobile app usability evaluation of the sdi mobile app discussion conclusion appendix a. questionnaire appendix b. nielsen’s severity ranking scale, based on a five-point scale appendix c. the system usability scale appendix d. identified features based on the literature review appendix e: pages from mobile app sdi endnotes core leadership column: making room for change through rest core leadership column making room for change through rest margaret heller information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13513 i write this column from the vantage point of my current role as a member of the core technology section leadership team, and as a newly elected president-elect of core, with my term starting in july 2021. the planning for core began years ago but became a real division of ala in the most chaotic of times. visions for the first year of core were set aside as we had to face the reality of all the work needing to be done remotely, without any conferences that would allow for in-person conversations, and with all the leadership and members under personal and professional strain. yet being forced to start up slowly and deliberately provides some advantages. settling into this new situation has allowed staff, leaders, and members to acclimate to a new division an d learn how we want to do things in the future, rather than relying too much on how we did things in the past or feeling pressure to meet every demand. right now, we are all at a juncture in our personal and professional lives, and thinking about how to approach the coming months. summer offers the promise of growth and reinvention. the pause that a break implies allows time for us both as individuals to make time for what is important to us, and as members or employees of institutions to reconsider our priorities. for people working in library technology, however, the “summer break” is often anything but. public libraries become a hub for activity as schools are closed, and school and academic libraries may use slow periods when classes are not in session for necessary systems upgrades or to roll out a new service. the summer of 2020 was one of the most challenging of my life, both professionally and personally, and meeting all the demands of the moment left hardly any time for a true break. this year, just like last year, feels like a summer we might not let ourselves rest for a moment. while many libraries have been open to some degree over the past year, the upcoming summer has the potential for a return to something like normal. shutting down regular in-person services and buildings felt chaotic since it required new ways of providing those services and building up new technical infrastructure, but without us having expected this in advance like a normal summer project. the return may also feel chaotic, but rather than approaching it as a series of tasks in a plan that requires lots of energy and work, i hope we can treat the time as a period of reflective practice and give ourselves time to understand what has changed. adapting to the realities of life since spring of 2020 has changed us all in various ways, and so too our library users have new needs and expectations. in some cases, they have embraced new services, though this has not been a smooth process for everyone. i have a family member who started using an e-reader for the first time during the pandemic to access library e-books when her public library was closed or had limited services. she was grateful for the option to access books this way, but occasionally struggled to follow the complex workflow from library app to vendor site to device. without the ability to visit a physical reference desk to ask for help, she asked me to assist with device troubleshooting on several occasions. that worked well for her, but margaret heller (mheller1@luc.edu) is digital services librarian, loyola university chicago, and (as of july 1, 2021) president-elect of the core: leadership, infrastructure, futures division of ala. © 2021. mailto:mheller1@luc.edu information technology and libraries june 2021 making room for change through rest | heller 2 not everyone has a digital services librarian in their quarantine bubble. i share this to illustrate that while some people will have adapted or gotten the help they need, for many, this time has been one of doing without or maladaptation. going back to “normal” will not help those who will need even more than they did pre-pandemic. taking time to understand that fact, and to accept that it will not be a quick process of return for many people, will allow us to give each other space to find a way back to our lives as library users and library employees. while many of us feel uncomfortable when we see slow progress—i know i do—i am coming to realize the value of making space for slowness and for rest. rest comes in all forms. it could be physical rest, but it could be pursuing an artistic or athletic hobby, intentional social interactions, or spiritual practices. institutions might give extra time off or set healthy expectations for work hours and meeting-free days, while also discarding old practices and attitudes to create better future work environments. there are crises to which we must immediately react and respond, but without personal and institutional energy in reserve, we will not do as good a job when they occur. crises include political upheaval, public health emergencies, and other major events, but we can also appreciate how they unfold on a more mundane level. information technology work often requires odd hours, intense bursts of energy to complete projects in a small window of time, and unpredictable problems that require dropping everything else to address an emergency. it is natural to constantly look towards the most urgent and the newest problem. this tendency results in lengthy backlogs for requests and accumulates technical debt from deferred maintenance or refactoring. yet as we bring our libraries and other institutions out of pandemic mode over the next few years, allowing for reflective space can help us to be cautious about the choices we make. for example, during earlier stages of the pandemic, many of us probably had to set u p systems for some type of surveillance to maintain social distancing and aid in contact tracing. taking some time to review all those new procedures and systems—and purposefully dismantle those with negative privacy implications—will help us to go forward as more ethical and empathetic institutions. taking it slow is going to be the only way through the next period. summer 2021 should be about reflection on collective trauma. we responded to the events of the past year, whether it was for closing libraries, keeping libraries open as safely as possible, racial justice work, or election support, and now we must consider how to incorporate what we started into lasting change. to do that reflection will require rest. we know how important rest is but finding space for it is not usually a high priority. rest allows us to integrate our experiences, and will build us back to make sure we can keep responding to what comes next. i am challenging myself to spend time in deliberate reflection at the cost of mindless productivity over the coming months so that i can keep helping my library and core succeed. i hope you will consider doing the same. library management practices in the libraries of pakistan: a detailed retrospective article library management practices in the libraries of pakistan a detailed retrospective asim ullah, shah khusro, and irfan ullah information technology and libraries | september 2022 asim ullah (asimullah@uop.edu.pk) is doctoral candidate, department of computer science, university of peshawar. shah khusro (khusro@uop.edu.pk) is professor, department of computer science, university of peshawar. corresponding author irfan ullah (irfan@sbbu.edu.pk) is assistant professor, department of computer science, shaheed benazir bhutto university, sheringal. © 2022. abstract library and information science has been at an infant stage in pakistan, primarily in resource management, description, discovery, and access. the reasons are many, including the lack of interest and use of modern tools, techniques, and best practices by librarians in pakistan. finding a solution to these challenges requires a comprehensive study that identifies the current state of libraries in pakistan. this paper fills this gap in the literature by reviewing the relevant literature published between 2015 and 2021 and selected through a rigorous search and selection methodology. it also analyzes the websites of 82 libraries in pakistan through a theoretical framework based on various aspects. the findings of this study include: libraries in pakistan need a transition from traditional and limited solutions to more advanced information and communication technology (ict)-enabled, user-friendly, and state-of-the-art systems to produce dynamic, consumable, and sharable knowledge space. they must adopt social semantic cataloging to bring all the stakeholders on a single platform. a libraries consortium should be developed to link users to local, multilingual, and multicultural collections for improved knowledge production, recording, sharing, acquisition, and dissemination. these findings benefit pakistani libraries, librarians, information science professionals, and researchers in other developing countries. to the best of our knowledge, this is the first study of its kind providing insights into the current state of libraries in pakistan through the study of their websites using a rigorous theoretical framework and in the light of the latest relevant literature. introduction with the inception of the web, library and information science (lis) professionals and researchers have solved several major challenges and issues regarding resource description, discovery, and access. yet, many new problems arise in the practices and services delivered by libraries if they are not in line with emerging technologies and standards. these problems are promptly addressed by the libraries and their lis professionals using cutting-edge technologies, sufficient training, and the availability of the required resources. this practice keeps these libraries functional and acceptable among their users, especially in developed countries. on the other hand, in less developed and developing countries, libraries are losing their importance, which may be due to the adherence of these libraries to outdated lis approaches. pakistan is one of the developing countries where this is often observed. but, before devising a solution to regain their value, importance, and acceptance, it is essential to identify the current state of libraries in pakistan. to address this need, this paper reviews and summarizes the findings of the well-reputed published literature regarding libraries in pakistan and collects and analyzes important details from library websites. mailto:asimullah@uop.edu.pk mailto:khusro@uop.edu.pk mailto:irfan@sbbu.edu.pk information technology and libraries september 2022 library management practices in the libraries of pakistan 2 ullah, khusro, and ullah this study is inspired by two review articles that considered different aspects of lis research.1 the most similar is the article from noh and chang, who analyzed lis practices by reviewing relevant literature regarding libraries in korea from 1970 to 2018.2 however, to the best of our knowledge, we found no holistic, systematic literature review covering the current state of library management practices in pakistan and highlighting its key challenges, issues, and research opportunities. similarly, ganaee and rafiq studied the current state and features of the websites of the 85 academic libraries of pakistan via surveys and interviews to identify their issues and problems.3 the websites were analyzed for contrasting color schemes, readable text, minimal use of horizontal scrolling, language, staff details, opacs, navigation, and other details of the information architecture. inspired by ganaee and rafiq, this study contributes a theoretical evaluation framework to study the current state of libraries by analyzing their websites. it comprises several aspects and criteria, including the availability of general information and information about resources and collections, the use of web 2.0 tools, the design of the website, the offering of web-based services, the use of instruction tools, and the application of accessibility guidelines for supporting individuals with visual and other impairments. the paper extends the findings and implications of the aforementioned research by highlighting the current state of library management practices in the libraries of pakistan, the challenges and issues those libraries face, and the research opportunities that lie ahead of them in the realm of modern digital technologies. the paper provides a systematic literature review of the relevant literature on the libraries of pakistan and devises a theoretical framework to collect and analyze data by visiting the websites of the selected 82 libraries of pakistan that have an online presence.4 the study has implications for researchers and lis professionals in pakistan and those of developing countries coping with similar challenges and issues. the first section of this paper presents the methodology for selecting relevant literature by adopting the well-known prisma framework.5 the second section presents a summary of key findings. the third section presents a discussion and analysis. the last section concludes the paper, followed by endnotes and an appendix holding data about the selected 82 websites of the libraries of pakistan. methodology this section discusses the literature search and selection strategy and the theoretical evaluation framework used to study the websites of the selected 82 libraries of pakistan. the literature search and selection strategy this section discusses the search and selection process for collecting the relevant literature using google scholar. google scholar indexes more than 389 million records and has the highest coverage of knowledge and research areas.6 we developed rigorous search and selection criteria by adopting the prisma methodology for gathering the relevant scholarly literature. 7 the prisma methodology is a systematic literature review approach, ensuring transparent and complete reporting on selecting relevant literature in a given course of inquiry.8 it tracks a full record of how the relevant literature was selected. it visualizes the details in a prisma flow diagram,9 shown in figure 1. the first step in applying prisma and following this diagram is to develop a search framework consisting of keywords or search queries that maximize the coverage and accuracy of finding relevant studies. the search framework for this study was developed by following ullah and khusro and liberati et al.10 table 1 summarizes the search framework and provides details on the search query and the number of total records matched by reading the information technology and libraries september 2022 library management practices in the libraries of pakistan 3 ullah, khusro, and ullah search results list’s title and text snippet, which resulted in the number of relevant records reported in the third column. the duplicates that appear after entering the next search query are recorded in the fourth column. the duplicates are removed from the counting of relevant records matched against the given search query. the net results are reported in the final column to be further screened by title, abstract, and other details on the publisher’s website. the inclusion/exclusion criteria are required to narrow down the selection criteria further so that only relevant items are included, and the irrelevant ones are filtered out or excluded. using this search framework, our inclusion criteria selected the following publications: • publications that discuss computer and web-based software solutions regarding resource acquisition, description (cataloging), discovery, and access inside the library or libraries of pakistan. • publications that highlight the use and the adaptation of technologies, especially modern cataloging practices, the use of semantic web, and linked open data (lod) in the libraries of pakistan. • publications that highlight issues and challenges faced by pakistani libraries to become part of the global library community and learn from their best practices in terms of software and related technologies. • publications in the english language with pakistani context and published during 2015 – 2021. the exclusion criteria to remove items from the list included the following: • publications published before 2015 and written in languages other than english. • publications that are of low academic significance with low-quality publication venues. examples include papers having incomplete details or those published in non-peerreviewed journals and conferences. • theses, dissertations, surveys, review articles, patents, and citations. information technology and libraries september 2022 library management practices in the libraries of pakistan 4 ullah, khusro, and ullah table 1. the search framework – keywords and criteria for finding relevant publications s. no. search query records matched relevant records by title & text snippet duplicates identified net items to be screened by title & abstract 1. “library science”, “information science”, “lis”, “libraries”, “pakistan” 963 15 0 15 2. “academic libraries”, “university libraries”, “digital libraries”, “pakistan” 645 48 2 46 3. “library staff”, “training”, “resources”, “library automation”, “libraries”, “pakistan” 355 26 15 11 4. “libraries”, “university libraries”, “hec”, “digital library”, “pakistan” 258 71 39 32 5. “collection management”, “collection development”, “libraries”, “pakistan” 240 11 11 0 6. ”design”, “accessibility”, “usability”, “responsiveness”, “websites”, “libraries”, “pakistan” 109 3 0 3 7. “social networking”, “social web”, “libraries”, “facebook”, “twitter”, “youtube”, “pakistan” 93 0 0 0 8. “services”, “web 2.0”, “rating”, “review”, “comment”, “libraries”, “pakistan” 75 1 1 0 9. “library automation”, “computerization”, “library software”, “libraries”, “pakistan” 76 5 5 0 10. “automation”, “integrated library systems”, “library software”, “pakistan” 68 8 7 1 information technology and libraries september 2022 library management practices in the libraries of pakistan 5 ullah, khusro, and ullah s. no. search query records matched relevant records by title & text snippet duplicates identified net items to be screened by title & abstract 11. “azad jammu and kashmir”, “punjab”, “sindh”, “khyber pakhtunkhwa”, “balochistan”, “gilgit”, “libraries”, “pakistan” 30 0 0 0 12. “digitization”, “digital skills”, “digital competencies”, “libraries”, “pakistan” 29 4 3 1 13. “book selection”, “acquisition”, “classification”, “cataloging”, “libraries”, “pakistan” 17 2 1 1 total 2958 194 84 110 figure 1 visualizes the search process using the well-known prisma diagram.11 google scholar retrieved 2,958 records. the search queries brought 84 duplicate records identified and removed, leading to 2,874 records left for initial screening. after an initial screening using title and text snippets, we identified 110 records to be relevant, leaving a total of 2,764 records. these 110 records were then accessed by visiting their publisher’s websites to read their title, abstract, and other details. the full texts of these papers were obtained. after applying skimming on the full -text of these records and considering the inclusion/exclusion criteria, 26 were excluded leaving behind 84 publications for in-depth reading and analysis. an in-depth reading of these 84 articles and application of the inclusion/exclusion criteria identified a further 3 articles to be irrelevant, leaving behind 81 articles to be relevant and to be included in the analysis and discussion. information technology and libraries september 2022 library management practices in the libraries of pakistan 6 ullah, khusro, and ullah figure 1. prisma diagram regarding the selection of relevant publications. information technology and libraries september 2022 library management practices in the libraries of pakistan 7 ullah, khusro, and ullah the evaluation framework the theoretical evaluation framework used to collect relevant data from the selected websites is shown in table 2. it summarizes the purpose of each criterion and its possible values using abbreviations. table 2. the evaluation framework for libraries: criteria and their descriptions s. no. criteria explanation 1. s. no. the serial no. of each record in table a-1 of appendix a: details of libraries 2. library name purpose: the name of the library. values: library name 3. url values: the url of the library. values: url 4. library website design12 purpose: whether the website design is kept user-centered and accessible for the blind and visually impaired people. values: language clarity (lc: yes/no); presentation clarity (pc: yes/no); support for special people (sp: yes/no); logical structure (ls: yes/no); responsive web design (rwd: yes/no); multilinguality (mlw) of web pages (yes/no) 5. general information13 purpose: general information available on the website regarding content. values: copyright statement (c); resources and services (rs); mission/goals/objectives (g); news/events (ne); contact details (cn); frequently asked questions (faq); last updated (lu); map/directions to the library (mp); calender (cl); virtual tour (vt); policies (p); word cloud (wc); opening hours details (oh), not available (na) 6. web 2.0 tools14 purpose: the purpose of web 2.0 tools is to connect the library users and get updates from the library management about different contents demanded or needed by the library users. users can share and comment on the library holdings in their friends’ circle through these social networking applications. this criterion is set for analyzing whether social networking applications are used in pakistani libraries or not and which social networking tool is mostly used. values: facebook (fb); flickr (fr); twitter (t); rss (r); social bookmarking (s); instagram (i); blogs (b); wikis (w); youtube (yt); pinterest (pi); not available (na) information technology and libraries september 2022 library management practices in the libraries of pakistan 8 ullah, khusro, and ullah s. no. criteria explanation 7. web-based library services15 purpose: the services offered by the library on the web. it has subcolumns including search, browsing, and other. search, values: opac; author (at); title (tt); subject (su); keyword (ke); and advanced search (as) browsing, values: author (at); title (tt); subject (su); category (ca); keyword (ke) other, values: ask a librarian (al); email (em); loan (ln); awareness (aw); newsletter (nw); delivery (de); sms; ready reference questions (rq); chat (ch); library exhibits (lx); feedback (fb); reserving computers (rc); council services (cs); smartphone-based services (sp); not available (na) 8. resources and collections16 purpose: this criterion aims to analyze the nature, variety, and types of the resources that are mostly available in pakistani libraries. values: opac; bibliographic databases (bd); full-text databases (ft); journals (j); books (b); audiobooks (ab); magazines (mg); online reference sources (or); opac of other libraries (opac-o); multimedia collections (mc); other (o); special collections (sc); multilinguality (mlr) of resources; not available (na); information of physical resources (ph) 9. instructional tools17 purpose: tools to guide users in searching, browsing, and other services. values: research guides (rg); subject guides/pathfinders (sg); opac search tips (tips); information literacy program (infl); citation guides (cg); online tutorials (ot); user groups (ug); plagiarism guides (pg); webinars (wb); not available (na) 10. accessibility guidelines18 purpose: whether the website and library follows the accessibility guidelines values: yes/no summary of key observations the lis practices in pakistan’s libraries are gradually shifting from manual to digital. however, they are still far from meeting the latest international practices of resource management, acquisition, cataloging, classification, circulation, discovery, access, and accessibility for people with disabilities, including those with visual impairments. this section has a twofold objective. first, it reviews the latest literature regarding the current state of lis practices in the libraries of pakistan to identify challenges and issues being faced and future research opportunities. second, it information technology and libraries september 2022 library management practices in the libraries of pakistan 9 ullah, khusro, and ullah extends these findings by evaluating the websites of the selected 82 libraries for a clearer picture of the current state of these chosen libraries. lis practices in the light of published literature this section discusses lis practices in the libraries of pakistan with details from the published literature. the following subsections briefly discuss these practices. collection development and management books are given greater importance as the main holdings in the libraries of pakistan. currently, printed books are selected in the conventional manual manner. book selection tools include suppliers’ lists, publishers’ catalogs, book fairs/visits to book shops, book reviews, recommendations from the readers, selection committees, suggestion registers, and publishers/suppliers’ desk copies. the requested books are supplied to the libraries. librarians check these books physically and verify their accuracy. if a book is damaged or not present, it is reported to the vendor so that new copies could be arranged. there is a rare case of online or electronic book selection and procurement from national and international book vendors. there is also a very rare practice of purchasing softcover books in batches. these aspects have been discussed in several research publications by lis professionals and researchers of pakistan. one prominent reason is the lack of a sufficient budget and standard clear resource acquisition and management policy.19 the following are some of the notable challenges and issues that appeared in the published literature: • the development of the quality collection.20 • lack of formal policies and guidelines for collection selection, acquisition, and related activities.21 • lack of electronic resources22 and challenges in their subscription and off-campus access. 23 • inadequate collections and the resulting limited use of resources.24 • financial constraints.25 • lack of formal policies and procedures for collection development and management, including selection, acquisition, digitization, and access.26 • lack of proper library communities and the coordination among them for collection development and management.27 • failure to fulfill the user information needs.28 researchers have made some recommendations (that could also be treated as research opportunities) to address these challenges: the libraries need to meet user needs and maintain their pace for disseminating the current and updated scientific knowledge and new insights in the literature to achieve excellence in service delivery.29 the factors affecting lis practices in the academic libraries of pakistan include collection development goals, management policies , and procedures, user requirements, budget, and evaluation.30 the user information needs should be considered to the fullest, and a user-centric approach should be developed to improve content selection.31 librarians should understand the use of linked and open data (lod) for creating standard metadata records for information resources management in libraries. 32 in this regard, the librarians should consider the major challenges, including the lack of technical expertise, awareness of the latest tools and technologies, the complexity of technologies, non-availability of vocabularies, and legal issues.33 the librarians must consider the research community’s limited information technology and libraries september 2022 library management practices in the libraries of pakistan 10 ullah, khusro, and ullah demand and use of e-resources in academic activities.34 there is a significant relationship between the digital resources database and the development of academic research for generating innovative ideas and improving researchers’ cognitive abilities. 35 therefore, libraries must be well aware of maintaining sufficient and up-to-date resources. social networking sites should be considered for knowledge management practices among the employees in public and private universities.36 effective policies should be developed to increase the researchers’ satisfaction and research productivity.37 resource description, discovery, and access as it relates to resource description and access, most libraries in pakistan use online public access catalogs (opacs). the use of specialized software, including, e.g., dspace and e-prints, for developing and using institutional repositories and digital libraries is rare. libraries are still relying on the conventional manual, partially computerized, slow, and old methods of records management and are limited to opacs-based search and retrieval. they are less aware and familiar with the modern best practices of using lod for resource description, sharing, and access. it is unproven and new to the libraries of pakistan for several reasons, including the complexity in deployment and usage and the constraints on financial and human resources.38 some of the notable challenges and issues that appeared in the published literature include the following: • lack of or limited searching and access to resources39 and their sharing. 40 • lack of synchronous or digital reference services41 and the poor availability of virtual reference services.42 • lack of search and retrieval solution for multilingual resources written in pashtu, arabic, and urdu.43 • limited or no use of big data analytics to improve acquisition, preservation, curation, and data analysis.44 • insufficient information on the websites regarding their libraries and lack of communication support for end users.45 • less frequent use of web 2.0, website aid tools, and limited information about their libraries.46 • the smaller size of the library website and the lack of aids including site index, frequently asked questions, user guides about its use.47 • the lack of awareness, best practices, it staff, and the complexity in implementing lod in resource description, discovery, sharing, and access.48 these challenges can be addressed if the recommendations of the researchers are considered. some of these recommendations, which also serve as research opportunities, include: the library management practices should consider using and exploiting ontologies and lod to develop more rigorous classification systems for improved resource description, discovery, and access.49 strategic planning and policies are essential for incorporating ict in the libraries of pakistan, with emphasis on resource description, discovery, access, and sharing through web-based services.50 besides the library’s reference desk and e-mail service, the online instant messaging and search engines tools must be used for virtual reference service (vrs) in libraries. a proper set of written policies and standard operating procedures (sops) for vrs must be introduced.51 the collaboration and sharing of experiences and skills for deploying lod is also vital.52 through lod, the libraries of pakistan can be linked to other global libraries to promote our indigenous information technology and libraries september 2022 library management practices in the libraries of pakistan 11 ullah, khusro, and ullah literature on the web.53 it is challenging to migrate data from text-based and marc catalogs to linked data formats. in addition, the recognition and providence of the uris are challenging. synchronizing terminologies with linked data technology and minimizing its complexity is also challenging. the conversion of marc 21 records to resource description framework (rdf) is onerous.54 a list of the bibliographic databases should be provided on the library website with instructions for their usage and relevant content should be made accessible discipline-wise through proper authentication login.55 services like “ask a librarian,” search, searching via barcode scanners, and maintaining a rich database should be considered by each library through their online and mobile phone interfaces.56 in developing smartphone-based library applications, it is essential to consider service quality, affinity, usefulness, ease of use, satisfaction, confirmation, and continuous usage.57 the information architecture of libraries’ websites should be analyzed from the perspective of their users, and their navigation system should be improved and adapted accordingly.58 the usefulness and cost are the most influential factors that should be considered while adopting library software such as koha.59 the design and quality of the contents and services of the library website are important.60 the use of digital library resources positively impacts research productivity and should be considered. 61 adherence to new standards, practices, and technologies the lack of interest from library staff in adopting and adhering to new standards and technologies is another inevitable aspect. another reason for this non-adherence could be the lack of knowledge by upper management and failure to understand the modern-day needs of library users. however, some developments are taking momentum. for example, several libraries offer web-based services.62 in some scenarios, university students use the social web to access and share resources.63 the pakistan scientific and technological information center (pastic) is developing a searchable database of indigenous collections64 supporting smartphone-based search and access.65 pastic is also creating a consortium-level public access catalog of the scientific periodicals produced by the authors of pakistan.66 the agha khan university has developed an integrated resource management system for connecting different, geographically dispersed libraries of various campuses in pakistan.67 access to digital libraries through the higher education of pakistan (hec) digital library, a library management system, and e-document delivery are some of the notable innovations in the lis domain of pakistan. 68 there are 122 public universities, 95 private universities, and more than 600 non-degree-awarding institutions with hec-dl access.69 the lis practices in pakistani libraries mostly suffer from the lack of professional training,70 awareness of the latest library standards and technologies,71 technological and it proficiency,72 policies for library processes and ict,73 knowledge regarding lod technologies,74 engagement with digitization activities,75 resource sharing, and collaboration,76 sufficient financial resources,77 the supportive and assistive atmosphere for persons with special needs,78 as well as issues regarding archiving, cataloging, and disseminating local and indigenous literature and artifacts.79 a library must find ways of adapting new tools, standards, technologies, and necessary training to support users in resource management, discovery, and access. the hec pakistan maintains one such library to offer free access to research publications and periodicals in different universities of pakistan and their scholars for off-campus online access.80 however, most university library users are not fully satisfied with collection development, and a major part of the literature is still not information technology and libraries september 2022 library management practices in the libraries of pakistan 12 ullah, khusro, and ullah accessible.81 besides, as discussed, pastic is playing its active role in developing a library consortium and a searchable database/catalog of the indigenous collections of pakistan. several university librarians have adopted knowledge management practices to deliver and improve their library services efficiently.82 apart from these few initiatives, the research and development of lis practices in the libraries of pakistan have been at very minimum and need significant attention. some of the notable challenges and issues that appeared in the published literature include • librarians have limited or outdated knowledge regarding research data management.83 • the inappropriate infrastructure.84 • limited or no use of ict, knowledge, and expertise in the use of computers, internet connectivity issues, inadequate computer labs.85 • training and leadership.86 • lack of supporting it staff.87 • lack or limited use of human resource management88 and leadership.89 • financial constraints.90 • lack of dynamic websites for the libraries.91 • lack of tools and standard library software.92 • the very basic level of digital competencies for developing, managing, and protecting digital libraries in universities of pakistan.93 • lack of uniformity and standard features in library websites. 94 • there is less frequent use of web 2.0, website aid tools, and limited information about their libraries.95 • the smaller size of the library website and the lack of aids including site index, frequently asked questions, user guides about its use.96 • the relative infant stage of information commons (information technology infrastructure, services, and resources).97 • negligible willingness and interest in research data management. 98 • reluctance in sharing research data99 and weak and informal collaboration on research.100 some recommendations (that could also be treated as research opportunities) made by researchers include: services, including electronic services, librarian’s end services, and technical knowledge services, should be improved in the special libraries of pakistan. 101 it is essential to understand the need to deploy and use library software, including, e.g., koha, dspace, e-prints, and evergreen.102 human resource management, especially effective leadership with a broader vision, boldness, charismatic personality, and knowledge dissemination abilities, is required to lead staff and manage their social relationships.103 as an information manager of the library, a librarian must be fully aware of web 3.0, the semantic web, and artificial intelligence (ai) tools to become expert in the digital landscape.104 web 2.0 tools and social networking sites should be used in marketing and advertising the library services to the end users.105 the cataloging paradigms should incorporate social collaborative cataloging metadata. 106 artificial intelligence tools and services should be considered where lis professionals can collaborate and join hands with computer science professionals to develop libraries.107 academic libraries’ performance can be improved by using big data tools and analytics.108 quality enhancement and industrial affiliation are important for increasing the quality and quantity of research in academia. 109 the information technology and libraries september 2022 library management practices in the libraries of pakistan 13 ullah, khusro, and ullah digital library, institutional repository software, bibliographic databases, e-journals searching, and referencing tools are very important for increasing the research production of the public sector universities.110 the competencies of ict skills, education in copyright laws and intellectual property, using digital and physical learning resources, and collection development must be improved.111 hec must provide funds for information commons projects for significant benefits to library users.112 lis practices in the light of the studied websites this section attempts to highlight the current state of the libraries in pakistan through data and observations collected from their websites. reviewing a library’s website reveals several aspects of its current state. table a-1 in appendix a summarizes the collected data obtained through the evaluation framework discussed in the methodology section and summarized in table 2. for example, a library with a website that is not user-centered and accessible to people with visual impairments, a criterion outlined as the third item in table 2, may face issues with supporting it staff, lack of expertise, and budget constraints. a library that is unable to offer web-based services cannot meet the needs of a major portion of its users interested in accessing content and servi ces online. a similar impact is connected to each of the remaining criteria of the evaluation framework. the lack of certain pieces of information on the library website affects their users negatively and may restrain them from using it. it is notable that most of the libraries of pakistan have no websites at all, which makes it challenging to discuss their strengths and limitations. as shown in figure 2, only 36% (of the selected 82 websites) of the libraries listed on the hec website have websites leaving 64% that have no online presence. this also makes it challenging to draw a clearer picture of the current state of libraries of pakistan and, therefore, the statistics presented here depict only a rough estimation of the exact details. figure 2. percentage of libraries in pakistan with and without websites. information technology and libraries september 2022 library management practices in the libraries of pakistan 14 ullah, khusro, and ullah figure 3 shows the statistics concerning the appearance and design of library websites in pakistan, which are improving in language and presentation clarity, logical structure, responsive web design, and access to the hec digital library. these websites need improvement in providing accessibility tools for people with disabilities, meeting accessibility guidelines, and incorporating multilingual support. figure 3. library website design, accessibility, and access to hec digital library (lc: language clarity; pc: presentation clarity; sp: support for special people; ls: logical structure; rwd: responsive web design; mlw: multiliguality of web pages; accessibility guidelines; and hec dl access). information technology and libraries september 2022 library management practices in the libraries of pakistan 15 ullah, khusro, and ullah figure 4 shows that most of the libraries (63 out of 82: 76.8%) offer general information on their websites. the most prominent among these include contact details (50 out of 82: 61%), copyright statement (47 out of 82: 57.3%), and library operating hours (46 out of 82: 56.09%), followed by resources (27 out of 82: 32.9%), news/events (25 out of 82: 30.5%) , mission/goals/objectives (24 out of 82: 29.3%), and maps/directions to the library building (19 out of 82: 23.2%), policies (18 out 82: 22%), frequently asked questions (16 out of 82: 19.5%), and last update (12 out of 82: 14.6%). the virtual tour, calendar, and word cloud are the least provided, as shown. finally, a considerable number of libraries (19 out of 82: 23.2%) lack most of the general information. figure 4. number of library websites that offer general information to its users about cn: contact details; c: copyright; oh: opening hours details; rs: resources; ne: news/events; g: mission/goals/objectives; mp: map/directions to the library; p: policies; faq: frequently asked questions; lu: last update; vt: virtual tour; wc: word cloud; cl: calendar; and na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 16 ullah, khusro, and ullah figure 5 shows the details of the libraries that allow sharing their contents or communicating with their users using web 2.0 tools and social media. most of the libraries (53 out of 82: 64.6%) are not connected with their users through social networking. most of the libraries that exploit web 2.0 tools, use facebook (26 out of 82: 31.7%), followed by twitter (22 out of 82: 26.8%), youtube (9 out of 82: 11%), instagram (8 out 82: 9.7%), and rss (5 out of 82: 6.1%). figure 5. number of library websites that provide social networking through fb: facebook; t: twitter; yt: youtube; i: instagram; r: rss; b: blog; s: social bookmarking; w: wikis; pi: pinterest; fr: flicker; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 17 ullah, khusro, and ullah figure 6 shows the statistics for the instructional tools used by different websites of libraries in pakistan. these tools are for the new visitors or the person who requires instruction in navigation, search, and access to the contents of the library’s website. most of the libraries (67 out of 82: 81.7%) do not offer instructional tools on the websites. only a few (15 out of 82: 18. 3%) provide instructional tools in one form or the other. these include information literacy programs (10 out of 82: 12.2%), citation guides (7 out of 82: 8.5%), research guides (6 out 82: 7.3%), subject guides/pathfinders (4 out of 82: 4.8%), tutorials (3 out of 82: 3.6%), opac search tips (3 out of 82: 3.6%), webinars (2 out of 82: 2.4%), program guides (2 out of 82: 2.4%), and user guides (1 out of 82: 1.2%). figure 6. number of library websites that provide the instructional tools of infl: information literacy program; cg: citation guides; rg: research guides; sg: subject guides/pathfinders; ot: online tutorials; tips: opac search tips; wb: webinars; pg: plagiarism guides; ug: user groups; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 18 ullah, khusro, and ullah figure 7 shows the statistics about searching as part of web-based services provided by different libraries on their websites. most libraries (53 out of 82: 66.2%) offer search using keywords (44 out of 82: 53.6%) followed by title (42 out of 82: 51.2%), advanced search (39 out of 82: 47.6%), authors (38 out of 82: 46.3%), subjects (36 out of 82: 43.9%), and opac (5 out of 82: 6.1%). a considerable number of libraries (29 out of 82: 35.4%) have no search functionality. figure 7. number of libraries offering web-based searching services through at: author; tt: title; su: subject; ke: keyword; as: advanced search; opac; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 19 ullah, khusro, and ullah figure 8 shows that most libraries’ websites (53 out of 82: 64.6%) offer browsing using different options and filters. most libraries allow browsing through categories (42 out of 82: 51.5%) followed by the title (40 out of 82: 48.8%), author (38 out of 82: 46.3%), subject (36 out of 82: 43.9%), and keywords (28 out of 82: 34.1%). several libraries (29 out of 82: 35.36%) offer no such browsing functionalities. figure 8. number of libraries offering web-based browsing service parameters including ca: category; tt: title; at: author; su: subject; ke: keyword; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 20 ullah, khusro, and ullah figure 9 shows the statistics for web-based services offered by libraries other than search and browsing, which are depicted separately in figures 7 and 8, respectively. most libraries (63 out of 82: 76.8%) do not offer these services on their websites. only a few of them (19 out of 82: 23.2%) offer services such as ask a librarian (14 out of 82: 17.1%), followed by email, delivery (9 out of 82: 11% each), loan (6 out of 82: 7.3%), chat, ready reference questions (4 out of 82: 4.9% each), and spreading awareness among users (3 out of 82: 3.6%). the remaining services such as newsletter, reserving computers for the users, council services, smartphone-based services, and short messaging service are offered on almost none of the selected libraries’ websites. figure 9. number of libraries offering other web-based library services that provide support for accessing and discovering any service or resource other than search and browsing. these services include al: ask a librarian; em: email; de: delivery; ln: loan; fb: feedback; ch: chat; rq: ready reference questions; aw: awareness; nw: newsletter; rc: reserving computers; lx: library exhibits; cs: council services; sp: smartphone-based services; sms; na: not available. information technology and libraries september 2022 library management practices in the libraries of pakistan 21 ullah, khusro, and ullah figure 10 shows the details offered by libraries about their resources and collections. a considerable number of these libraries (20 out of 82: 24.4%) provide no such information. most libraries (62 out of 82: 75.6%) give details about books (45 out 82: 54.8%), followed by journals (39 out of 82: 45.6%), bibliographic databases (37 out of 82: 45.1%), opac (17 out of 82: 20.7%), full-text databases (10 out of 82: 12.2%), magazines (9 out of 82: 11%), physical books (7 out of 82: 8.7%), online reference services (6 out of 82: 7.3%), opac of other libraries (3 out of 82: 3.7%), audiobooks (2 out of 82: 2.4%), and multimedia collections (1 out of 82: 1.2%). figure 10. number of libraries offering resources and collections including b: books; j: journals; bd: bibliographic databases; opac; o: other; ft: full-text databases; mg: magazines; ph: physical books; or: online reference sources; opac-o: opac of other libraries; ab: audiobooks; mc: multimedia collections; sc: special collection; mlr: multilinguality of resources; na: not available. discussion and analysis the study of websites of the 82 libraries of pakistan reveals that the majority are not technically sound and cannot assist and offer services to its users, including people with visual or physical impairments. the key observations made in the previous section emphasize the need for the libraries of pakistan to transform their libraries’ practices from manual to automatic and webbased services. this can be achieved through collaborative research and development efforts from several domains, including computer science, lis, human-computer interaction, ai, the semantic web, and lod. there are several examples of library consortia that enable collaborative efforts to make available and accessible catalogs, websites, and activities from a single platform.113 these include the online information technology and libraries september 2022 library management practices in the libraries of pakistan 22 ullah, khusro, and ullah computer library center (oclc), the international coalition of library consortia (icolc), hathitrust digital library, the arxiv e-print archive, google books, and shared print storage.114 in pakistan, pastic made the first effort to develop such a consortium115 to allow access to the holdings of the libraries of pakistan by combining their opacs. it offers a searchable database of the collections and enables resource sharing among all the member libraries.116 however, its successful implementation in pakistan requires the willingness of data sharing, professional interaction, and benefiting from the modern technologies among all the libraries of pakistan. the consortium should be supported with the best practices from information retrieval and semantic web technologies to offer better search and retrieval functionalities. users should be made part of the resource description so that the idea of social semantic cataloging117 can be realized, where users can discuss their information needs, recommend books and resources, and enrich the catalog with user-generated content. the artificial intelligence and deep learning algorithms should be exploited in book recommendations so that the available professional metadata and user-generated content could be used to the fullest in serving the users’ information needs. the resulting rich metadata should be made available and consumable on the lod to benefit other potential applications. this will enable the libraries to meet the complex information needs of the users, who describe them in natural language. the natural language is ambiguous, and resources described through user-generated content produced by users in the same language will better support the search and recommendation of books.118 this will improve the resource description, discovery, and access services of the libraries of pakistan to a greater extent. figure 3 depicts another significant limitation of the websites of the libraries of pakistan : extremely limited availability of navigational, retrieval, and visualization aids for people with visual impairments. most of the libraries’ websites have no provision for accessibility mechanisms. this is unfortunate as in 2017 it was reported that 21.78 million people were affected by blindness and vision impairment.119 although several technological aids have been defined for performing daily life activities, including navigation, orientation, localization, obstacle detection, etc.,120 the libraries of pakistan, in the majority, lack accessibility-related solutions for those who are blind or have a visual impairment. holdings should be enriched with audio and braille books and supplemented with an ict-based accessibility solution. the library building should accommodate visitors with diverse needs. information about accessibility should be shared as part of the general information on the library’s website. in this regard, all the stakeholders of the libraries, including government and non-government organizations, educational institutions, and lis professionals, should be made involved to work collaboratively on an effective accessibility solution for all library users.121 smartphones have been among the top trends in pakistan, especially for college and university students who use them most frequently. according to the infographic by grappetite, 77% of smartphone users are between 21 and 30, and 12% are aged 31 to 40 years.122 by closely looking at these statistics, people of these two age groups are the most potential users of libraries as they usually need a variety of books. according to statista, smartphone ownership in pakistan has increased from 10% in 2014 to 51% in 2020.123 according to pakistan telecommunication authority, currently there are 191 million cellular/mobile phone subscribers, and there are 110 million 3g/4g subscribers.124 these statistics suggest that libraries should also benefit from information technology and libraries september 2022 library management practices in the libraries of pakistan 23 ullah, khusro, and ullah incorporating smartphones. the most prominent opportunities are developing smartphone apps that support users in knowing about the collection of a library via the web and producing an interactive user interface that helps them find answers to several of their questions regarding library services. the library opacs can be made usable and accessible through mobile web applications. there are several prospects and opportunities regarding using library space for people with disabilities through smartphones. a smartphone application can be developed to enable readers in navigation, localization, and finding items of interest in the library. conclusions this study aims to provide a holistic view of the current state of libraries in pakistan in the light of the most relevant and recent research works from lis professionals and researchers. it also attempts to identify some of the major challenges, issues, and research opportunities regarding the current state of lis practices in libraries of pakistan with that of technologically advanced countries. the study suggests a need for increasing technology proficiency, adaptability of the latest technologies, proper legislation for lis practices that meet international standards, improvements in collection development, and efforts to meet library users’ needs. the libraries of pakistan need a transition from traditional and limited solutions to a more advanced, ict-enabled, user-friendly, and state-of-the-art system to produce a dynamic, consumable, and sharable knowledge space. the libraries must adopt a social semantic cataloging environment to bring all stakeholders to a single platform. development of a library consortium is critical to connect our local, multilingual, and multicultural collections to users for improved knowledge production, recording, sharing, acquisition, and dissemination. we hope that lis professionals of pakistan and the rest of the world, in general, find this article supportive to their current and future studies. information technology and libraries september 2022 library management practices in the libraries of pakista 24 ullah, khusro, and ullah appendix a: details of libraries table a-1. the comparison and evaluation of libraries using the criteria in table 1. s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 1. central library university of peshawar http://www.uop.e du.pk/library/ ✓ ✓     c, g, ne, cn na na na na ph na  ✓ 2. brains institute peshawar http://www.brains .edu.pk/library-2/ ✓ ✓     mp fb, t, i na na na ph na  ✓ 3. library of edwardes college, the mall peshawar cantt https://www.edw ardes.edu.pk/libra ry ✓ ✓     c, oh na na na na na na  ✓ 4. the aga khan university library https://www.aku. edu/library/pages /home.aspx ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, mp, cl, vt, p, wc, oh fb, t, i, yt ke, as ca, ke na ph, o na  ✓ 5. air university central library https://www.au.e du.pk/pages/libra ry/about_library.a spx ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, mp, cl, vt, p, wc, oh fb, t ke, tt, su, as ca na ph, b, bd, j na  ✓ 6. the allama iqbal open university (aiou) http://library.aiou. edu.pk/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, faq, lu, oh, wc na ke, tt, su, as ca na ph, b, bd, j na  ✓ 7. bahria university libraries https://bahria.edu .pk/libraries/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, p, oh fb, t, i, fr ke, tt, su, as ca na ph, b, bd, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 25 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 8. library of balochistan university of engineering & technology, khuzdar http://www.buetk .edu.pk/?page_id= 7368 ✓ ✓  ✓   g na na na na ph na  ✓ 9. library of balochistan university of information technology, engineering & management sciences (buitems) https://www.buit ms.edu.pk/library/ defaulthecandbuit ems.aspx       na na na na na na na  ✓ 10. library of baqai medical university https://baqai.edu. pk/digitallibrary.php       na na na na na na na  ✓ 11. library of barrett hodgson university https://www.bhu. edu.pk/home/tier librarybuilding       na na na na na na na   12. library of beaconhouse national university https://www.bnu. edu.pk/bnu/facilit ies/library ✓ ✓  ✓   g, oh na na na na na na  ✓ 13. comsats university junaid https://ciit.insi gniails.com/lib ✓ ✓  ✓ ✓ ✓ c, rs, g, ne, cn, faq, lu, fb, t at, tt, su, ke, at, tt, su, ke, de, b, o ot ✓ ✓ https://ciit.insigniails.com/library/home https://ciit.insigniails.com/library/home information technology and libraries september 2022 library management practices in the libraries of pakista 26 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other zaidi library rary/home https://library.co msats.edu.pk/ oh as,ss ca al., 14. city university of science and information technology http://cusit.edu.p k/library/       na na na na na na na  ✓ 15. library of fatima jinnah women university https://fjwu.edu.p k/library/ ✓ ✓     rs, g, ne, cn, lu, oh, p fb, t na na na na na  ✓ 16. library of federal urdu university of arts, sciences & technology https://fuuast.edu .pk/library/       na na na na na na na  ✓ 17. library of forman christian college http://library.fccol lege.edu.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh, mp r, t, fb at, tt, su, ke, as,ss at, tt, su, ca al, em, ln, aw, nw, de, rq, lx, fb, rc, cs opac, bd, ft, j, b, ab, mg, opac-o, mc rg, sg, tips, infl, cg, ot, ug, pg, wb  ✓ 18. library of foundation university, http://fui.edu.pk/ fui_main_site/in dex.php/campus      c na na na na na na  ✓ https://ciit.insigniails.com/library/home information technology and libraries september 2022 library management practices in the libraries of pakista 27 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other islamabad life/library 19. library of gift university https://www.gift.e du.pk/page/library -overview ✓ ✓  ✓   c, rs, g, ne, cn, p, oh na tt, at, su ke ch, al, em, aw, de opac, bd, ft, j, b, or na  ✓ 20. library of ghulam ishaq khan institute of engineering sciences & technology http://119.159.23 5.56:8085/forms/ default.aspx ✓ ✓  ✓   c, oh na at, tt, su at, tt, su, ca na opac, bd, ft, j, b, ab na  ✓ 21. library of gomal university http://clib.ddns.ne t/ ✓ ✓  ✓ ✓  p, wc na ke, at, tt, su, as ke, at, tt, su na opac, bd, ft, j, b na  ✓ 22. library of government college university http://library.gcu. edu.pk/ ✓ ✓  ✓ ✓  c, rs, g, ne, cn, lu, mp, p, oh na at, tt, su, ke, as at, tt, su, ca al, em, ln, rq, fb opac, bd, ft, j, b, or rg, sg, tips, cg  ✓ 23. government college university faisalabad https://library.gcu f.edu.pk/ ✓ ✓  ✓ ✓  g, cn, p na ke, tt, at, as su na opac, bd, ft, j na  ✓ 24. library of government college for women university https://www.gcw us.edu.pk/library/ ✓ ✓  ✓ ✓  c, rs, p, cn, oh na as, ke, at, tt, su ca, ke, at, tt, su na bd, ft, j na  ✓ 25. library of https://www.gre.a ✓ ✓  ✓ ✓  c, rs, g, ne, fb, t, i, as, at ca em, b, bd ot, wb ✓  information technology and libraries september 2022 library management practices in the libraries of pakista 28 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other greenwich university c.uk/it-andlibrary/library cn, faq, lu, mp, cl, vt, p, wc, oh yt ln, de, rq, ch, fb, sp 26. library of hitec university http://111.68.98.2 04/libmax/opac/in dex.aspx ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh na ke ke na b, j, opac na  ✓ 27. library of habib university https://habib.edu. pk/library/ ✓ ✓  ✓ ✓  oh, c, rs, cn, fb, t, i, yt ke ke al, em, ln, aw, nw, de, ch opac, j, ft, b, mg na  ✓ 28. library of hamdard university http://library.ham dard.edu.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, p, oh r, t, fb tt, at, su, ke ca, ke al, de opac, bd, j, b infl, sg  ✓ 29. panjab elibrary https://elibrary.pu njab.gov.pk/ ✓ ✓  ✓ ✓  c, rs, cn, faq, g, ne, p, oh, mp fb, t, yt tt, at, su, as tt, at, su, ca fb opac, bd, mg, ft, b, j infl  ✓ 30. library of ilma university https://ilmauniver sity.edu.pk/digitall ibrary ✓ ✓  ✓ ✓  c, mp na na na na bd, ft, j, mg, or na  ✓ 31. library of iqra national university https://iqra.edu.p k/library/ ✓ ✓  ✓ ✓  c, oh, cn na na na na na na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 29 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 32. library of international islamic university https://www.ii u.edu.pk/?page _id=171 ✓ ✓  ✓ ✓  c, ne, oh, cn, rs fb, t, at, tt, su at, tt, su fb o, bd, opac na  ✓ 33. library of institute of space technology https://www.ist.e du.pk/library       na na na na na na na  ✓ 34. library of institute of southern punjab https://isp.edu.pk /libraryitsupport       na na na na na na na  ✓ 35. library of islamia university punjab http://library.iub.e du.pk/ ✓ ✓  ✓ ✓  oh na at, tt, su, ke, as at, tt, su, ke, ca rc, em, de opac na  ✓ 36. library of isra university https://isra.edu.pk /library/     ✓  na na tt, su, at, as tt, su, at, ca na opac na  ✓ 37. library of jinnah sindh medical university http://www.jsmu. edu.pk/faciltieslibrary.html       na na na na na na na  ✓ 38. library of khyber medical university https://www.kmc. edu.pk/new/librar y/       na na na na na na na  ✓ 39. library of king edward medical university https://kemu.edu. pk/library       g, oh na na na na na na  ✓ 40. library of lahore college for http://www.lcwu. edu.pk/lcwu✓ ✓  ✓ ✓  g, rs, faq, p na na na na bd na  ✓ https://www.iiu.edu.pk/?page_id=171 https://www.iiu.edu.pk/?page_id=171 https://www.iiu.edu.pk/?page_id=171 information technology and libraries september 2022 library management practices in the libraries of pakista 30 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other women university library-researchwebsites.html 41. library of lahore university of management sciences https://library.lum s.edu.pk/ ✓ ✓  ✓ ✓  ne, cn, vt, oh fb, i ke, at, tt ke, at, tt, su, ca al, ch bd, j, b infl, tips, rg  ✓ 42. library of mehran university of engineering & technology http://library.mue t.edu.pk/index.ph p ✓ ✓  ✓ ✓  c, ne, cn, oh fb, yt, t, i, r, b at, tt, su, ke, as at, tt, su, ca al, de, ln bd, j, b, opac, or infl, rg  ✓ 43. library of minhaj university https://library.mul .edu.pk/ ✓ ✓  ✓ ✓  c, ne, mp, rs, cn, oh, g fb, t, yt at, tt, su, ke, as at, tt, su, ca ln, de b, j, or, bd, infl, pg, cg, rg  ✓ 44. library of mirpur university of science & technology https://cms.must. edu.pk:8083/form s/default.aspx ✓ ✓  ✓ ✓  c, oh, cn na at, tt, su, as, ke at, tt, su, ke, ca na b na  ✓ 45. library of mohammad ali jinnah university http://ils.jinnah.ed u/ ✓ ✓  ✓ ✓  c, oh, cn na at, tt, su, ke, as at, tt, su, ca na b, j, bd na  ✓ 46. engr. abul kalam library ned university of engineering & technology https://library.ned uet.edu.pk/ ✓ ✓  ✓ ✓  c, cn na ke, au, tt ke, at, tt na b, j, mg, bd cg  ✓ 47. library of namal http://library.nam ✓ ✓  ✓ ✓  ne, cn, oh na at, tt, at, tt, na j, b, bd, na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 31 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other institute, mainwali al.edu.pk/ su, ke, as su, ca mg 48. library of national defense university http://111.68.99.1 07/libmax/opac/in dex.aspx ✓ ✓  ✓ ✓  c, cn, oh na ke, as ke, ca na b, j, mg na  ✓ 49. library of national textile university http://ntu.edu.pk/ library/ ✓ ✓  ✓ ✓  cn, oh, ne, faq fb tt, su, at, as tt, su, at, ca na b, bd, j, opac cg, infl  ✓ 50. library of national university of sciences & technology http://www.nust. edu.pk/library/pa ges/default.aspx ✓ ✓  ✓ ✓  cn, mp, c, g, oh, vt, faq na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac infl  ✓ 51. library of peoples university of medical & health sciences for women http://opac.pumh s.edu.pk/ ✓ ✓  ✓ ✓  cn, mp, c, g, oh, vt, faq na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac infl  ✓ 52. library of shaheed benazir bhutto university sheringal dir upper pakistan http://142.54.178. 188:5229/ ✓ ✓  ✓ ✓  na na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j, opac na  ✓ 53. library of shaheed zulfikar ali bhutto institute of science & technology https://szabist.ed u.pk/szabistlibrary/ ✓   ✓ ✓  cn, mp, c, g, oh, vt, ne, faq, p na ke, as at, tt, su, ke, ca al, em b, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 32 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 54. library of sir syed case institute of technology https://case.edu.p k/library/default. aspx ✓ ✓  ✓ ✓  oh, cn fb, t tt, at, ke, su, as tt, ca na b, o na  ✓ 55. library of the islamia college, peshawar http://142.54.178. 188:5209 ✓ ✓  ✓ ✓  na na na na na na na  ✓ 56. library of university of balochistan http://web.uob.ed u.pk/uob/departm ents/library/libra ry.php ✓ ✓  ✓ ✓  cn, mp, c na ke ke na b na  ✓ 57. library of the university of agriculture peshawar http://www.aup.e du.pk/library.php       na na na na na na na  ✓ 58. library of university of buner https://www.ubun er.edu.pk/library ✓      oh, g, c na na na na na na   59. library of university of central punjab http://library.ucp. edu.pk/ ✓ ✓  ✓ ✓  oh, g, c, rs, ne, cn, mp, vt fb tt, at, as tt, at, ca na b, mg, j, bd cg, infl  ✓ 60. library of university of engineering & technology khyber pakhtunkhwa https://www.uetp eshawar.edu.pk/li brary.php ✓      na na na na na na na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 33 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other 61. library of university of engineering technology lahore http://library.uet. edu.pk/ ✓ ✓  ✓ ✓  na r at, tt, su, ke, as ke, ca, at, tt al b, j, bd na  ✓ 62. library of university of engineering & technology, taxila https://www.uett axila.edu.pk/librar y.aspx ✓ ✓  ✓ ✓  cn, rs, ne, oh, c t at, tt, su, ke, as at, tt, su, ke, ca al b, bd, j na  ✓ 63. library of university of haripur http://www.uoh.e du.pk/centrallibrary.php?page= mjyx ✓ ✓  ✓ ✓  cn, rs, ne, oh, c na ke at, tt, su, ke, ca na b, bd, j, o na  ✓ 64. library of university of karachi http://www.uok.e du.pk/library/inde x.php ✓ ✓  ✓ ✓  cn, rs, ne, oh, c, mp na ke at, tt, su, ke, ca na b, bd, j, o na  ✓ 65. library of university of management & technology https://library.um t.edu.pk/home.as px ✓ ✓  ✓ ✓  cn, rs, ne, oh, c, mp fb, t at, tt, su, ke, as at, tt, su, ke, ca al, em b, bd, j na  ✓ 66. online catalogue, central library, university of sargodha http://142.54.178. 188:5157/ ✓ ✓  ✓ ✓  na na at, tt, su, ke, as at, tt, su, ke, ca na b, bd, j na  ✓ 67. library of university of https://library.usa. edu.pk/ ✓ ✓  ✓ ✓  cn, rs, lu, oh, c na at, tt, su, ke, at, tt, su, ke, al, rq b, bd, j na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 34 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other south asia as ca 68. library of university of the punjab https://pulibrary.edu.pk / ✓ ✓  ✓ ✓  cn, rs, oh, c fb tt, at, as, ke tt, ca al, ch, em bd, b, j, o, opac-o rg, sg, cg  ✓ 69. library of zia-uddin university https://zu.edu.pk/ academics/library/ ✓ ✓  ✓ ✓  cn, rs, oh, c, g, ne, p, lu na at, tt, su, ke, as at, tt, su, ke, ca na bd, j, opac-o na  ✓ 70. library of cabinet division, islamabad http://ndw.gov.pk /index.html ✓ ✓  ✓ ✓  cn, rs, oh, c, faq, g, ne, p, lu na na na na na na   71. elibrary, government of the punjab https://elibrary.pu njab.gov.pk/ ✓ ✓  ✓ ✓  mp fb, t, yt at, tt, su, ke, as at, tt, su, ke, ca na bd, b, j, o, opac-o na   72. hec digital library http://hecpk.sum mon.serialssolutio ns.com/ ✓ ✓  ✓ ✓  na na ke, as at, su, ca na b, o, j, mg na  ✓ 73. bahauddin zakariya university (bzu), multan http://library.bzu. edu.pk ✓ ✓  ✓ ✓  na na na na na b, j na  ✓ 74. begum nustrat bhutto women university, sukkur http://143.244.15 7.171 ✓ ✓  ✓   na fb, i opac, at, tt, su, ke, as at, tt, su, ke, ca na b, j na  ✓ 75. cecos university of information http://sites.google .com/view/library ✓ ✓  ✓   lu, cn, oh na opac, at, tt na b na  ✓ https://pulibrary.edu.pk/ https://pulibrary.edu.pk/ information technology and libraries september 2022 library management practices in the libraries of pakista 35 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other technology & emerging sciences cup/home at, tt 76. dha suffa university http://dclkarachi.c om ✓ ✓  ✓ ✓  c, cn, mp fb, t tt, at, su, ke tt, at, su, ke na b na  ✓ 77. institute of business management https://iobm.daph nis.opalsinfo.net/b in/home ✓ ✓  ✓ ✓  c, cn, lu r opac, at, tt, ke, as, su ca, at, tt, su na b na  ✓ 78. jinnah university for women https://www.juw. edu.pk/campusfacilities/library-1/ ✓ ✓  ✓   c, cn na na na na na na  ✓ 79. khawaja freed university of engineering & information technology, rahim yar khan https://kfueit.edu. pk/aboutlibrary?1=1&menu =sidelink?main=840&m ain=859&parent=f acilities ✓ ✓  ✓   c, cn, oh fb, t, yt na na na na na  ✓ 80. kinnaird college for women, lahore http://www.kinnai rd.edu.pk/library3/ ✓ ✓  ✓   cn, faq na na na na or na   81. lahore leads university https://leads.edu. pk/libraries-.php ✓ ✓  ✓   cn, c fb, t opac, at, tt, ke, as, su ca, at, tt, su na b na  ✓ 82. minhaj university https://lrc.mul.ed ✓ ✓  ✓ ✓  c, cn, mp fb, t, yt opac, su, ca na b na  ✓ information technology and libraries september 2022 library management practices in the libraries of pakista 36 ullah, khusro, and ullah s. no. library name url library website design general information web 2.0 tools web-based library services resources / collections instructional tools accessibility guidelines hec dl access lc pc sp ls rwd mlw search browse other u.pk/ ke information technology and libraries september 2022 library management practices in the libraries of pakistan 37 ullah, khusro, and ullah endnotes 1 younghee noh and rosa chang, “international collaboration in library and information science research in korea,” international journal of knowledge content development & technology 9, no. 2 (2019):91–110, https://doi.org/10.5865/ijkct.2019.9.2.091; muhammad abbas ganaee and muhammad rafiq, “pakistani university library web sites: features, contents, and maintenance issues,” journal of web librarianship 10, no. 4 (2016): 294–315, https://doi.org/10.1080/19322909.2016.1195308. 2 noh and chang, “international collaboration in korea,” 95. 3 ganaee and rafiq, “pakistani university library web sites,” 294. 4 in this study, we evaluated the websites of the libraries of public and private sector universities and research institutes. these websites are listed on the digital library website of hec, pakistan, available at http://www.digitallibrary.edu.pk/institutes.php. 5 alessandro liberati et al., “the prisma statement for reporting systematic reviews and metaanalyses of studies that evaluate health care interventions: explanation and elaboration,” journal of clinical epidemiology 6, no. 7 (2009): e1–e34, https://doi.org/10.1016/j.jclinepi.2009.06.006. 6 michael gusenbauer, “google scholar to overshadow them all? comparing the sizes of 12 academic search engines and bibliographic databases,” scientometrics 118, no. 1 (2019):177– 214, https://doi.org/10.1007/s11192-018-2958-5. 7 liberati et al., “prisma,” e9. 8 liberati et al., “prisma,” e1. 9 liberati et al., “prisma,” e5. 10 irfan ullah and shah khusro, “social book search: the impact of the social web on book retrieval and recommendation,” multimedia tools and applications 79, no. 11 (2020): 8011– 60, https://doi.org/10.1007/s11042-019-08591-0; liberati et al., “prisma,” e9–e10. 11 liberati et al., “prisma,” e5. 12 charlene l. al-qallaf and alaa ridha, “a comprehensive analysis of academic library websites: design, navigation, content, services, and web 2.0 tools,” international information & library review 51, no. 2 (2019): 93–106, https://doi.org/10.1080/10572317.2018.1467166; rozalynd p. mcconnaughy and steven p. wilson, “content and design features of academic health sciences libraries’ home pages,” medical reference services quarterly 37, no. 2 (2018): 153–67, https://doi.org/10.1080/02763869.2018.1439219; gricel dominguez, sarah j. hammill, and ava iuliano brillat, “toward a usable academic library web site: a case study of tried and tested usability practices,” journal of web librarianship 9, no. 2–3 (2015): 99–120, https://doi.org/10.1080/19322909.2015.1076710. https://doi.org/10.5865/ijkct.2019.9.2.091 https://doi.org/10.1080/19322909.2016.1195308 http://www.digitallibrary.edu.pk/institutes.php https://doi.org/10.1016/j.jclinepi.2009.06.006 https://doi.org/10.1007/s11042-019-08591-0 https://doi.org/10.1080/10572317.2018.1467166 https://doi.org/10.1080/02763869.2018.1439219 https://doi.org/10.1080/19322909.2015.1076710 information technology and libraries september 2022 library management practices in the libraries of pakistan 38 ullah, khusro, and ullah 13 al-qallaf and ridha, “web 2.0 tools,” 100; mcconnaughy and wilson, “libraries’ home pages,” 166–67; anna mierzecka and andrius suminas, “academic library website functions in the context of users’ information needs,” journal of librarianship and information science 50, no. 2 (2018): 157–67, https://doi.org/10.1177/0961000616664401; alan kerr and diane rasmussen pennington, “public library mobile apps in scotland: views from the local authorities and the public,” library hi tech 36, no. 2 (2018): 237–51, https://doi.org/10.1108/lht-05-2017-0091; saleeq ahmad dar, “mobile library initiatives: a new way to revitalize the academic library settings,” library hi tech news 36, no. 5 (2019): 15–21, https://doi.org/10.1108/lhtn-05-2019-0032. 14 al-qallaf and ridha, “web 2.0 tools,” 102; mcconnaughy and wilson, “libraries’ home pages,” 159. 15 al-qallaf and ridha, “web 2.0 tools,” 95–97; irfan ullah and shah khusro, “on the search behaviour of users in the context of interactive social book search,” behaviour & information technology 39, no. 4 (2020): 443–62, https://doi.org/10.1080/0144929x.2019.1599069; mcconnaughy and wilson, “libraries’ home pages,” 153–67; mierzecka and suminas, “website functions,” 164; kerr and pennington, “scotland,” 243; dar, “mobile library initiatives,” 15 –17. 16 al-qallaf and ridha, “web 2.0 tools,” 100; mcconnaughy and wilson, “libraries’ home pages” 153–67; mierzecka and suminas, “website functions,” 162–64; kerr and pennington, “scotland,” 243. 17 al–qallaf and ridha, “web 2.0 tools,” 95–100; mcconnaughy and wilson, “libraries’ home pages,” 158; mierzecka and suminas, “website functions,” 161, 162; dar, “mobile library initiatives,” 19. 18 mierzecka and suminas, “website functions,” 158; paul khawaja, “a software tool-based accessibility assessment of public library websites in the united states,” (masters paper, university of north carolina, chapel hill, (2020): 1–51, https://doi.org/10.17615/432g-f412; rita kosztyánné mátrai, “how to make an electronic library accessible,” the electronic library 36, no. 4 (2018): 620–32, https://doi.org/10.1108/el-07-2017-0143. 19 muhammad rafi, ghalib khan, and sikandar ali, “challenges associated with resource selection in public libraries of khyber pakhtunkhawa, pakistan,” information and knowledge management 6, no. 2 (2016): 27–33, https://www.iiste.org/journals/index.php/ikm/article/view/28709. 20 ghalib khan and rubina bhatti, “collection development and management in the university libraries of pakistan: a survey of users’ satisfaction,” international information & library review, 53, no. 3 (2021): 239–53, https://doi.org/10.1080/10572317.2020.1830739; muhammad rafi, sikandar ali, and ashfaq ahmad, “administrative challenges to public libraries in khyber pakhtunkhawa pakistan: an empirical study,” journal of studies in social sciences 15, no. 1 (2016): 32–48, https://infinitypress.info/index.php/jsss/article/view/1280. 21 khan and bhatti, “collection development,” 248; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 36. https://doi.org/10.1177/0961000616664401 https://doi.org/10.1108/lht-05-2017-0091 https://doi.org/10.1108/lhtn-05-2019-0032 https://doi.org/10.1080/0144929x.2019.1599069 https://doi.org/10.17615/432g-f412 https://doi.org/10.1108/el-07-2017-0143 https://www.iiste.org/journals/index.php/ikm/article/view/28709 https://doi.org/10.1080/10572317.2020.1830739 https://infinitypress.info/index.php/jsss/article/view/1280 information technology and libraries september 2022 library management practices in the libraries of pakistan 39 ullah, khusro, and ullah 22 amjid khan, rubina bhatti, ghalib khan, and muhammad ismail, “the role of academic libraries in facilitating undergraduate and post-graduate studies: a case study of the university of peshawar, pakistan,” chinese librarianship: an international electronic journal 2014, no. 38 (2014): 36–49, http://white-clouds.com/iclc/cliej/cl38kbki.pdf; atta ur-rehman marwat and muhammad younus, “evaluation of college libraries in khyber pakhtunkhwa, pakistan: condition, role, and challenges,” library philosophy and practice 2020, no. 4049 (2020): 1–43, https://digitalcommons.unl.edu/libphilprac/4049. 23 muhammad naeem and nadeem siddique, “use of print and electronic journals by the academic community: a survey at gc university lahore,” library philosophy and practice 2020, no. 3788 (2020): 1–16, https://digitalcommons.unl.edu/libphilprac/3788; muhammad abbas ganaee, “library websites of pakistani universities: an exploratory study,” qualitative and quantitative methods in libraries 5, no. 2 (2017): 385–95, http://www.qqml.net/index.php/qqml/article/view/325; alia arshad and kanwal ameen, “academic scientists’ scholarly use of information resources in the digital environment: perceptions and barriers,” global knowledge, memory and communication 67, no. 6/7 (2018): 467–83, https://doi.org/10.1108/gkmc-05-2018-0044. 24 khan et al., “facilitating,” 36, 45. 25 muhammad rafiq, kanwal ameen, and munazza jabeen, “barriers to digitization in university libraries of pakistan: a developing country’s perspective,” the electronic library 36, no. 3 (2018): 457–70, https://doi.org/10.1108/el-01-2017-0012; marwat and younus, “college libraries,” 24; nadeem siddique and khalid mahmood, “status of library software in higher education institutions of pakistan,” international information & library review 47, no. 3–4 (2015): 59–65, https://doi.org/10.1080/10572317.2015.1087796. 26muhammad rafiq and kanwal ameen, “towards a digitization framework: pakistani perspective,” pakistan journal of information management & libraries 15, no. 1 (2014): 22–29, http://journals.pu.edu.pk/journals/index.php/pjiml/article/view/757; khan and bhatti, “collection development,” 241; marwat and younus, “college libraries,” 37. 27 rafiq, ameen, and jabeen, “barriers,” 459, 465; khan and bhatti, “collection development,” 252. 28 khan and bhatti, “collection development,” 247. 29 khan et al., “facilitating,” 46; 30 khan and bhatti, “collection development,” 240. 31 khan and bhatti, “collection development,” 240, 247. 32nosheen fatima warraich and abebe rorissa, “adoption of linked data technologies among university librarians in pakistan: challenges and prospects,” malaysian journal of library & information science 23, no. 3 (2018): 1–13, https://doi.org/10.22452/mjlis.vol23no3.1. http://white-clouds.com/iclc/cliej/cl38kbki.pdf https://digitalcommons.unl.edu/libphilprac/4049 https://digitalcommons.unl.edu/libphilprac/3788 http://www.qqml.net/index.php/qqml/article/view/325 https://doi.org/10.1108/gkmc-05-2018-0044 https://doi.org/10.1108/el-01-2017-0012 https://doi.org/10.1080/10572317.2015.1087796 http://journals.pu.edu.pk/journals/index.php/pjiml/article/view/757 https://doi.org/10.22452/mjlis.vol23no3.1 information technology and libraries september 2022 library management practices in the libraries of pakistan 40 ullah, khusro, and ullah 33 nazia wahid nosheen, fatima warraich and muzammil tahira, “mapping the cataloguing practices in information environment: a review of linked data challenges,” information and learning science 119, no. 9/10 (2018): 586–96, https://doi.org/10.1108/ils-10-2017-0106. 34haseeb ahmad piracha and kanwal ameen, “policy and planning of research data management in university libraries of pakistan,” collection and curation 38, no. 2 (2019): 39–44, https://doi.org/10.1108/cc-08-2018-0019; amjid khan and shamsahd ahmed, “usage of edatabases and e-journals by research community in pakistani universities: issues and perspectives,” library philosophy and practice, 2020, no. 4570 (2020): 1–11, https://digitalcommons.unl.edu/libphilprac/4570. 35 muhammad rafi, zheng jianming, and khurshid ahmad, “evaluating the impact of digital library database resources on the productivity of academic research,” information discovery and delivery 47, no. 1 (2019): 42–52, https://doi.org/10.1108/idd-07-2018-0025; asif altaf and nosheen fatima warraich, “awareness and use of electronic information sources by university students in pakistan,” pakistan library & information science journal 48, no. 4 (2017): 14–25, https://www.researchgate.net/publication/326356264. 36muhammad naeem and mohammad javid khan, “do social networking applications support the antecedents of knowledge sharing practices?” vine journal of information and knowledge management systems 49, no. 4 (2019): 494–509, https://doi.org/10.1108/vjikms-12-20180133. 37 amjid khan, shamshad ahmed, asad khan, and ghalib khan, “the impact of digital library resources usage on engineering research productivity: an empirical evidences from pakistan,” collection building 36, no. 2 (2017): 37–44, https://doi.org/10.1108/cb-10-2016-0027; ikram ul haq and rabiya ali faridi, “knowledge sharing practices amongst the library and information professionals of pakistan in the digital era,” in cooperation and collaboration initiatives for libraries and related institutions, ed. collence takaingenhamo chisita (hershey, pa: igi global, 2020) 200–17, https://doi.org/10.4018/978-1-7998-0043-9.ch010. 38 warraich and rorissa, “adoption,” 7, 8, 13. 39 khan and bhatti, “collection development,” 248; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 42; siddique and mahmood, “library software,” 64. 40rafiq, ameen, and jabeen, “barriers,” 464, 465, 467; sajjad ahmad, shehzad ahmad, and muhammad kamran, “electronic information resource sharing among the research scholars: a case of university of peshawar,” pakistan library & information science journal 50, no. 2 (2019): 45–60. 41 mirza abdul rasheed and muhammad rafiq, “new trends and practices for digital reference service (drs) a survey in the university libraries of punjab, pakistan,” pakistan library & information science journal 48, no. 4 (2017): 44–55; saira hanif soroya and kanwal ameen, “what do they want? millennials and role of libraries in pakistan,” the journal of academic librarianship 44, no. 2 (2018): 248–55, https://doi.org/10.1016/j.acalib.2018.01.003. https://doi.org/10.1108/ils-10-2017-0106 https://doi.org/10.1108/cc-08-2018-0019 https://digitalcommons.unl.edu/libphilprac/4570 https://doi.org/10.1108/idd-07-2018-0025 https://www.researchgate.net/publication/326356264 https://doi.org/10.1108/vjikms-12-2018-0133 https://doi.org/10.1108/vjikms-12-2018-0133 https://doi.org/10.1108/cb-10-2016-0027 https://doi.org/10.4018/978-1-7998-0043-9.ch010 https://doi.org/10.1016/j.acalib.2018.01.003 information technology and libraries september 2022 library management practices in the libraries of pakistan 41 ullah, khusro, and ullah 42rubia khan, arif khan, sidra malik, and haroon idrees, “virtual reference services through web search engines: study of academic libraries in pakistan,” publications 5, no. 2 (2017): 1–13, https://doi.org/10.3390/publications5020006. 43 hafiz habib-ur-rehman, haroon idrees, and ahsan ullah, “organization and usage of information resources at deeni madaris libraries in pakistan,” library review 66, no. 3 (2017): 163–78, https://doi.org/10.1108/lr-02-2016-0016; nadeem siddique and khalid mahmood, “library software in pakistan: a review of literature,” library review 63, no. 3 (2014): 224– 40, https://doi.org/10.1108/lr-04-2013-0048. 44 khurshid ahmad, zheng jianming, and muhammad rafi, “an analysis of academic librarians competencies and skills for implementation of big data analytics in libraries,” data technologies and applications 53, no. 2 (2019): 201–16, https://doi.org/10.1108/dta-092018-0085. 45 shahzad abbas, shanawar khalid, and fakhar abbas hashmi, “library websites as source of marketing of library resources: an empirical study of hec recognized universities of pakistan,” qualitative and quantitative methods in libraries 5, no. 1 (2017): 235–49, http://www.qqml.net/index.php/qqml/article/view/321; rubina bhatti, awais asghar, and amjid khan, “the websites in university libraries of pakistan: current status and new perspectives,” pakistan library & information science journal 46, no. 1 (2015): 26–35. 46 ganaee and rafiq, “pakistani university library web sites,” 294, 303–9. 47 ganaee, “library websites,” 385. 48 warraich and rorissa, “linked data technologies,” 7–9. 49 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 50 muhammad ss mirza, and muhammad arif, “challenges in information technology adoption in pakistani university libraries,” international journal of knowledge content development & technology 6, no. 1 (2016): 105–16, https://doi.org/10.5865/ijkct.2016.6.1.105. 51 khan et al., “virtual reference services,” 4–8. 52 wahid, warraich, and tahira, “mapping,” 587, 593. 53nosheen fatima warraich, “linked data technologies in libraries: an appraisal,” journal of political studies 23, no. 2 (2016): 697–707. 54 corine deliot, “publishing the british national bibliography as linked open data,” catalogue & index, no. 174 (march 2014): 13–18, https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf. https://doi.org/10.3390/publications5020006 https://doi.org/10.1108/lr-02-2016-0016 https://doi.org/10.1108/lr-04-2013-0048 https://doi.org/10.1108/dta-09-2018-0085 https://doi.org/10.1108/dta-09-2018-0085 http://www.qqml.net/index.php/qqml/article/view/321 https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.5865/ijkct.2016.6.1.105 https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165-b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf https://cdn.ymaws.com/www.cilip.org.uk/resource/collection/f71f19c3-49cf-462d-8165-b07967ee07f0/catalogue_and_index_issue_174,_march_2014.pdf information technology and libraries september 2022 library management practices in the libraries of pakistan 42 ullah, khusro, and ullah 55 muhammad rafi et al., “knowledge-based society and emerging disciplines: a correlation of academic performance,” the bottom line 33, no. 4 (2020): 337–58, https://doi.org/10.1108/bl-12-2019-0130. 56 ali mansouri and nooshin soleymani asl, “assessing mobile application components in providing library services,” the electronic library 37, no. 1 (2019): 49–66, https://doi.org/10.1108/el-10-2018-0204. 57 hamaad rafique et al., “do digital students show an inclination toward continuous use of academic library applications? a case study,” the journal of academic librarianship 47, no. 2 (2020): 1–15, https://doi.org/10.1016/j.acalib.2020.102298. 58 ganaee and rafiq, “pakistani university library web sites,” 303–10. 59 asad khan, “investigating the factors influencing librarians’ intention toward the adoption of koha—an open source integrated library system in pakistan,” library philosophy and practice 2020, no. 4360 (2020): 1–52: https://digitalcommons.unl.edu/libphilprac/4360. 60 arslan sheikh, “evaluating the usability of comsats institute of information technology library website: a case study,” the electronic library 35, no. 1 (2017): 121–36, https://doi.org/10.1108/el-08-2015-0149. 61 khan and ahmed, “research community,” 2, 3. 62arif khan, haroon idrees, and khan mudassir, “library web sites for people with disability: accessibility evaluation of library websites in pakistan,” library hi tech news 32, no. 6 (2015): 1–7, https://doi.org/10.1108/lhtn-01-2015-0010. 63muhammad tariq and khalid mahmood, “use, purpose and usage ranking of online informatio n resources by university research students,” 2015 4th international symposium on emerging trends and technologies in libraries and information services, noida, india (2015): 257–63, https://doi.org/10.1109/ettlis.2015.7048208. 64 “online book search,” pakistan scientific and technological information center (pastic), accessed march 24, 2022, http://pastic.gov.pk/advancebooksearch.aspx. 65“objectives,” pakistan scientific and technological information center (pastic), accessed march 22, 2022, http://pastic.gov.pk/objectives.aspx?par=abtp&cmenu=objectives. 66 “about us,” consortium of s&t and r&d libraries of pakistan (cstrdlp), accessed march 22, 2022, http://consortium.pastic.gov.pk. 67 ashraf sharif, “integrating libraries across continents: a case of aga khan university’s nine libraries in five countries,” (paper, national conference on career development of lis professionals and overall improvement of libraries in pakistan, islamabad, 2012): 1–12, https://ecommons.aku.edu/libraries/18. https://doi.org/10.1108/bl-12-2019-0130 https://doi.org/10.1108/el-10-2018-0204 https://doi.org/10.1016/j.acalib.2020.102298 https://digitalcommons.unl.edu/libphilprac/4360 https://doi.org/10.1108/el-08-2015-0149 https://doi.org/10.1108/lhtn-01-2015-0010 https://doi.org/10.1109/ettlis.2015.7048208 http://pastic.gov.pk/advancebooksearch.aspx http://pastic.gov.pk/objectives.aspx?par=abtp&cmenu=objectives http://consortium.pastic.gov.pk/ https://ecommons.aku.edu/libraries/18 information technology and libraries september 2022 library management practices in the libraries of pakistan 43 ullah, khusro, and ullah 68 sania awais and kanwal ameen, “the current innovation status of university libraries in pakistan,” library management 40, no. 3/4 (2019): 178–90, https://doi.org/10.1108/lm-112017-0125. 69 “participants of digital library,” higher education commission (hec) of pakistan, accessed march 24, 2022, http://digitallibrary.edu.pk/institutes.php. 70 warraich and rorissa, “adoption,” 8; rafiq, ameen, and jabeen, “barriers,” 465; shamshad ahmed, arslan sheikh, and muhammad akram, “implementing knowledge management in university libraries of punjab, pakistan,” information discovery and delivery 46, no. 2 (2018): 83–94, https://doi.org/10.1108/idd-08-2017-0065; sabah jan, “status of electronic resources in libraries: a review study,” library philosophy and practice 2019, no. 2524 (2019): 1–20, https://digitalcommons.unl.edu/libphilprac/2524. 71 warraich and rorissa, “adoption,” 1, 7–9; rafiq, ameen, and jabeen, “barriers,” 459, 460; ahmed, sheikh, and akram, “knowledge management,” 84. 72 rafiq and ameen, “digitization framework,” 26; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 39; kanwal ameen, “changing scenario of librarianship in pakistan: managing with the challenges and opportunities,” library management 32, no. 3 (2011): 171–82, https://doi.org/10.1108/01435121111112880. 73 rafiq, ameen, and jabeen, “barriers,” 463; asad khan, mohamad noorman masrek, khalid mahmood, and saima qutab, “factors influencing the adoption of digital reference services among the university librarians in pakistan,” the electronic library 35, no. 6 (2017): 1225–46, https://doi.org/10.1108/el-05-2016-0112; ghalib khan and rubina bhatti, “the impact of higher education commission of pakistan’s funding on the collection development budgets of university libraries,” the bottom line 29, no. 1 (2016): 12–24, https://doi.org/10.1108/bl06-2015-0008. 74 irfan ullah, shah khusro, asim ullah, and muhammand naeem, “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432; wahid, warraich, and tahira, “mapping,” 593. 75 rafiq, ameen, and jabeen, “barriers,” 460. 76 piracha and ameen, “policy and planning,” 39, 42; ahmed, sheikh, and akram, “knowledge management,” 85. 77 warraich and rorissa, “adoption,” 7, 8; jan, “status,” 13. 78 sania awais and kanwal ameen, “information accessibility for students with disabilities: an exploratory study of pakistan,” malaysian journal of library & information science 20, no. 2 (2017): 103–15, https://mjlis.um.edu.my/article/view/1768; khan, idrees, and mudassir, “accessibility evaluation,” 6. https://doi.org/10.1108/lm-11-2017-0125 https://doi.org/10.1108/lm-11-2017-0125 http://digitallibrary.edu.pk/institutes.php https://doi.org/10.1108/idd-08-2017-0065 https://digitalcommons.unl.edu/libphilprac/2524 https://doi.org/10.1108/01435121111112880 https://doi.org/10.1108/el-05-2016-0112 https://doi.org/10.1108/bl-06-2015-0008 https://doi.org/10.1108/bl-06-2015-0008 https://doi.org/10.6017/ital.v37i4.10432 https://mjlis.um.edu.my/article/view/1768 information technology and libraries september 2022 library management practices in the libraries of pakistan 44 ullah, khusro, and ullah 79 nosheen fatima warraich, amara malik, and kanwal ameen, “gauging the collection and services of public libraries in pakistan,” global knowledge, memory and communication 67, no. 4/5 (2018): 244–58, https://doi.org/10.1108/gkmc-11-2017-0089. 80 alia arshad and kanwal ameen, “scholarly communication in the age of google: exploring academics’ use patterns of e-journals at the university of the punjab,” the electronic library 35, no. 1 (2017): 167–84, https://doi.org/10.1108/el-09-2015-0171. 81 khan and bhatti, “collection development,” 242, 251. 82 khurshid ahmad and muhammad rafiq, “methods of knowledge management practices in pakistani universities’ libraries,” nust journal of social sciences and humanities 4, no. 1 (2018): 115–26, https://doi.org/10.51732/njssh.v4i1.30. 83 piracha and ameen, “policy and planning,” 42. 84 piracha and ameen, “policy and planning,” 39. 85 khan and bhatti, “collection development,” 248–49; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 34, 41, 45; muhammad arif and khalid mahmood, “the changing role of librarians in the digital world: adoption of web 2.0 technologies by pakistani librarians,” the electronic library 30, no. 4 (2012): 469–79, https://doi.org/10.1108/02640471211252184; warraich, malik, and ameen, “gauging,” 249– 55; amjid khan and shamshad ahmed, “analyzing the relationship between organizational culture and lifelong learning among the information professionals in the university libraries of pakistan,” information discovery and delivery 50, no. 1 (2022): 1–11, https://doi.org/10.1108/idd-01-2019-0001; ahsan ullah and harron idrees, “technical staff positions and technology related tasks: a study of university libraries in pakistan,” pakistan journal of information management & libraries 18, no. 1 (2017): 52–61, https://ssrn.com/abstract=2918822. 86 khan and bhatti, “collection development,” 242; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 44; marwat and younus, “college libraries,” 27; shehzad ahmad and sajjad ahmad, “status of ict in the university libraries of khyber pakhtunkhwa,” pakistan library & information science journal 48, no. 2 (2017): 37–48; sajjad ahmad, shehzad ahmad, and muhammad arshad, “attitude of university information professionals’ toward the use and application of ict: a case of khyber pakhtunkhwa,” pakistan library & information science journal 51, no. 3 (2020): 51–64; rabia abdul karim and anila fatima shakil, “a research study about the importance of e library for globalized learning among students at university level in karachi, pakistan,” rads journal of social sciences & business management 4, no. 2 (2017): 104–14, http://www.jssbm.juw.edu.pk/index.php/jssbm/article/view/45. 87 piracha and ameen, “policy and planning,” 39–41; rafi, ali, and ahmad, “khyber pakhtunkhawa,” 44; warraich, malik, and ameen, “gauging,” 249; ahmad, ahmad, and kamran, “sharing,” 45; ullah and idrees, “technical staff,” 59; ahmad and ahmad, “ict,” 37; karim and shakil, “globalized learning,” 113; warraich and rorissa, “adoption,” 8; siddique and mahmood, “pakistan,” 237. https://doi.org/10.1108/gkmc-11-2017-0089 https://doi.org/10.1108/el-09-2015-0171 https://doi.org/10.51732/njssh.v4i1.30 https://doi.org/10.1108/02640471211252184 https://doi.org/10.1108/idd-01-2019-0001 https://ssrn.com/abstract=2918822 http://www.jssbm.juw.edu.pk/index.php/jssbm/article/view/45 information technology and libraries september 2022 library management practices in the libraries of pakistan 45 ullah, khusro, and ullah 88 rafiq, ameen, and jabeen, “barriers,” 463–66. 89 murtaza ashiq, shafiq ur rehman, and syeda hina batool, “academic library leaders’ conceptions of library leadership in pakistan,” malaysian journal of library & information science 24, no. 2 (2019): 55–71, https://doi.org/10.22452/mjlis.vol24no2.4; piracha and ameen, “policy and planning,” 42, 43; amara malik and kanwal ameen, “library and information science collaboration in pakistan: challenges and prospects,” information and learning science 119, no. 9/10 (2018): 555–71, https://doi.org/10.1108/ils-09-2017-0096. 90 rafiq, ameen, and jabeen, “barriers,” 463–67; marwat and younus, “college libraries,” 24; siddique and mahmood, “library software,” 61. 91 mirza and arif, “challenges,” 113. 92 marwat and younus, “college libraries,” 39; ahmad and ahmad, “ict,” 38–47; siddique and mahmood, “pakistan,” 224, 234, 235. 93 shakeel ahmad khan and rubina bhatti, “digital competencies for developing and managing digital libraries: an investigation from university librarians in pakistan,” the electronic library 35, no. 3 (2017): 573–97, https://doi.org/10.1108/el-06-2016-0133. 94 midrar ullah, “content analysis of medical college library websites in pakistan indicates necessary improvements,” health information & libraries journal (14 july, 2021): 1–10, https://doi.org/10.1111/hir.12386. 95 ganaee and rafiq, “pakistani university library web sites,” 294, 303–9. 96 ganaee, “library websites,” 385. 97arslan sheikh, “development of information commons in university libraries of pakistan: the current scenario,” the journal of academic librarianship 41, no. 2 (2015): 130–39, https://doi.org/10.1016/j.acalib.2015.01.002. 98 piracha and ameen, “policy and planning,” 39, 42, 43. 99 piracha and ameen, “policy and planning,” 42. 100 malik and ameen, “collaboration,” 563, 564. 101 waqar ahmad, muhammad shahid soroya, and munazza jubeen, “electronic, librarian’s end, techno knowledge and multifactor services in the special libraries of lahore,” pakistan library & information science journal 48, no. 4 (2017): 102–14, https://www.researchgate.net/publication/334546141. 102 nadeem siddique and khalid mahmood, “combating problems related to library software in higher education institutions of pakistan: an analysis of focus groups,” malaysian journal of library & information science 21, no. 1 (2016): 35–51, https://doi.org/10.22452/mjlis.vol21no1.3. https://doi.org/10.22452/mjlis.vol24no2.4 https://doi.org/10.1108/ils-09-2017-0096 https://doi.org/10.1108/el-06-2016-0133 https://doi.org/10.1111/hir.12386 https://doi.org/10.1016/j.acalib.2015.01.002 https://www.researchgate.net/publication/334546141 https://doi.org/10.22452/mjlis.vol21no1.3 information technology and libraries september 2022 library management practices in the libraries of pakistan 46 ullah, khusro, and ullah 103 ashiq, rehman, and batool, “library leaders,” 61–68. 104 waqar ahmed, “third generation of the web: libraries, librarians and web 3.0,” library hi tech news 32, no. 4 (2015): 6–8, https://doi.org/10.1108/lhtn-11-2014-0100. 105 abid hussain and saeed ullah jan, “awareness of web 2.0 technology in the academic libraries: an islamabad perspective,” library philosophy and practice 2018, no. 1945 (2018): 1–13, https://digitalcommons.unl.edu/libphilprac/1945; azizur rahman, amjid khan, and ghalib kan, “assessment of web 2.0 applications in university libraries of khyber pakhtunkhwa,” pakistan library & information science journal 50, no. 3 (2019): 9–18; muhammad tufail khan and muhammad rafiq, “library social media services (lsms)! going viral for survival,” pakistan library & information science journal 50, no. 3 (2019): 23–32. 106 ullah et al., “current state,” 64–66. 107 muhammad yousuf ali, salaman bin naeem, and rubina bhatti, “artificial intelligence tools and perspectives of university librarians: an overview,” business information review 37, no. 3 (2020): 116–24, https://doi.org/10.1177/0266382120952016. 108 y. m. atiquil islam, khurshid ahmad, muhammad rafi, and zheng jianming, “performance– based evaluation of academic libraries in the big data era,” journal of information science 47, no. 4 (2020): 458–71, https://doi.org/10.1177/0165551520918516. 109 abid hussain and muhammad ibrahim, “research productivity of library and information science in khyber pakhtunkhwa: a case study of sarhad university of science peshawar, pakistan,” journal of information management and library studies 1, no. 1 (2018): 54–63, http://jimls.kkkuk.edu.pk/jimls/index.php/jimls/article/view/14. 110 shamshad ahmed and atta ur rehman, “perceptions and level of ict competencies: a survey of librarians at public sector universities in khyber pakhtunkhwa, pakistan,” pakistan journal of information management and libraries 18, no. 1 (2016): 1–11, http://journals.pu.edu.pk/journals/index.php/pjiml/article/viewarticle/951; altaf and warraich, “awareness,” 14–22; munir moosa sadruddin, “contribution of digital libraries and its role in reaping quality researches in pakistan – challenges and opportunities,” pakistan library & information science journal 46, no. 1 (2015): 60–70. 111 muhammad umar farooq, ahsan ullah, memoona iqbal, and abid hussain, “current and required competencies of university librarians in pakistan,” library management 37, no. 8/9 (2016): 410–45, https://doi.org/10.1108/lm-03-2016-0017. 112 sheikh, “information commons,” 138. 113 kimberly l. armstrong and thomas h. teper, “library consortia and the cic: leveraging scale for collaborative success,” serials review 43, no. 1 (2017): 28–33. https://doi.org/10.1080/00987913.2017.1284493. 114 armstrong and teper, “library consortia,” 30–32. 115 “about us,” pastic. https://doi.org/10.1108/lhtn-11-2014-0100 https://digitalcommons.unl.edu/libphilprac/1945 https://doi.org/10.1177/0266382120952016 https://doi.org/10.1177/0165551520918516 http://jimls.kkkuk.edu.pk/jimls/index.php/jimls/article/view/14 http://journals.pu.edu.pk/journals/index.php/pjiml/article/viewarticle/951 https://doi.org/10.1108/lm-03-2016-0017 https://doi.org/10.1080/00987913.2017.1284493 information technology and libraries september 2022 library management practices in the libraries of pakistan 47 ullah, khusro, and ullah 116 “objectives,” pastic. 117 ullah et al, “current state,” 64–66. 118 ullah et al., “current state,” 64–67. 119 bilal hassan, ramsha ahmed, bo li, ayesha noor, and zahid ul hassan, “a comprehensive study capturing vision loss burden in pakistan (1990–2025): findings from the global burden of disease (gbd) 2017 study,” plos one 14, no. 5 (2019): e0216492, https://doi.org/10.1371/journal.pone.0216492. 120 izaz khan, shah khusro, and irfan ullah, “technology-assisted white cane: evaluation and future directions,” peerj 6, no. e6058 (2018.): 1–27, https://doi.org/10.7717/peerj.6058. 121 awais and ameen, “information accessibility,” 111–13. 122 “smartphone usage in pakistan,” accessed march 24, 2022, https://pas.org.pk/smart-phoneusage-in-pakistan-infographics. 123 “smartphone penetration rate as share of connections in pakistan from 2014 to 2020,” statista research department, december 30, 2016, https://www.statista.com/statistics/671542/smartphone-penetration-as-share-ofconnections-in-pakistan. 124 “telecom indicators,” pakistan telecommunication authority, january 2022, https://www.pta.gov.pk/en/telecom-indicators https://doi.org/10.1371/journal.pone.0216492 https://doi.org/10.7717/peerj.6058 https://pas.org.pk/smart-phone-usage-in-pakistan-infographics https://pas.org.pk/smart-phone-usage-in-pakistan-infographics https://www.statista.com/statistics/671542/smartphone-penetration-as-share-of-connections-in-pakistan https://www.statista.com/statistics/671542/smartphone-penetration-as-share-of-connections-in-pakistan https://www.pta.gov.pk/en/telecom-indicators abstract introduction methodology the literature search and selection strategy the evaluation framework summary of key observations lis practices in the light of published literature collection development and management resource description, discovery, and access adherence to new standards, practices, and technologies lis practices in the light of the studied websites discussion and analysis conclusions appendix a: details of libraries endnotes letter from the editors (september 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.169xx the september 2023 issue marks the first at our new url, https://ital.corejournals.org/. after more than a decade of generous sponsorship by boston college, information technology and libraries is now hosted by ala production services. from the reader perspective, very little other than the domain name has changed. most links to pages and articles at the earlier site (https://ejournals.bc.edu/index.php/ital) redirect to their new locations, as do dois for already published articles. however, if you spot anything that seems off, please let us know! in july, ital’s editorial board began meeting with its new cohort. we welcome cindi blyberg, joanna dipasquale, john klima, ellen schmid, and le yang to the editorial board and are excited to have them join us. the board’s focus over the next few months will be to review and update the journal’s mission statement to make sure that it reflects the audiences we hope to reach and the range of topics that are of most interest to readers. we will be providing updates over the coming issues. rest assured that the focus of the journal—the intersection of technology and libraries and other cultural memory institutions—will remain at the forefront. in this issue peer-reviewed articles in the current issue are listed here: • “redesigning research guides: lessons learned from usability testing at the university of memphis,” by jessica mcclure, carl hess, and david marsicano • “design, development, implementation, and evaluation of a mobile application for academic library services: a study in a developing country,” by hamid reza saeidnia, marcin kozak, brady lund, nishith reddy mannuru, hamid keshavarz, bakthavachalam elango, afshin babajani, and ali ghorbi • “dspace 7 benefits: is it worth upgrading?” by matus formanek • “privacy audit of public access computers and networks at a public college library,” by katelyn angell • “from chatgpt to catgpt: the implications of artificial intelligence on library cataloging,” by richard brzustowicz the september contribution to our regular “public libraries leading the way” column, “automating the diversity audit process” by rachel fischer of the cooperative computer services, a public library consortium in illinois provides a discussion of tools that can help describe the diversity of a library’s collections. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ital.corejournals.org/ https://ital.corejournals.org/index.php/ital/article/view/15535 https://ital.corejournals.org/index.php/ital/article/view/15535 https://ital.corejournals.org/index.php/ital/article/view/15977 https://ital.corejournals.org/index.php/ital/article/view/15977 https://ital.corejournals.org/index.php/ital/article/view/16209 https://ital.corejournals.org/index.php/ital/article/view/16233 https://ital.corejournals.org/index.php/ital/article/view/16295 https://ital.corejournals.org/index.php/ital/article/view/16925 https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com in this issue be a part of a future issue editorial board thoughts: tools of the trade sharon farnel information technology and libraries | march 2012 5 as i was trying to settle on a possible topic for this, my second “editorial board thoughts” piece, i was struggling to find something that i’d like to talk about and that ital readers would (i hope) find interesting. i had my “eureka!” moment one day as i was coming out of a meeting, thinking about a conversation that had taken place around tools. now, by tools, i’m referring not to hardware, but to those programs and applications that we can and do use to make our work easier. the meeting was of our institutional repository team, and the tools discussion specifically focused on data cleanup and normalization, citation integration, and the like. i had just recently returned from a short conference where i had heard mentioned or seen demonstrated a few neat applications that i thought had potential. a colleague also had just returned from a different conference, excited by some of things that he’d learned about. and all of the team members had, in recent days, seen various e-mail messages about new tools and applications that might be useful in our environment. we mentioned and discussed briefly some of the tools that we planned to test. one of the tools had already been test driven by a couple of us, and looked promising; another seemed like it might solve several problems, and so was bumped up the testing priority list. during the course of the conversation, it became clear that each of us had a laundry list of tools that we wanted to explore at greater depth. and it also became clear that, as is so often the case, the challenge was finding the time to do so. as we were talking, my head was full of images of an assembly line, widgets sliding by so quickly that you could hardly keep up. i started thinking how you could stand there forever, overwhelmed by the variety and number of things flying by at what seemed like warp speed. alternatively, if you ever wanted to get anywhere, do anything, or be a part of it all, you just had to roll up your sleeves and grab something. the meeting drew to a close, and we all left with a sense that we needed to find a way of tackling the tools-testing process, of sharing what we learn and what we know, all in the hope of finding a set of tools that we, as a team, could become skilled with. i personally felt a little disappointed at not having managed to get around to all of the tools i’d earmarked for further investigation. but i also felt invigorated at the thought of being able to share the load of testing and researching. if we could coordinate ourselves, we might be able to test drive even more tools, increasing the sharon farnel (sharon.farnel@ualberta.ca) is metadata and cataloguing librarian, university of alberta, edmonton, alberta, canada. mailto:sharon.farnel@ualberta.ca editorial board thoughts | farnel 6 likelihood we’d stumble on the few that would be just right! we’d taken furtive steps towards this in the past, but nothing coordinated enough to make it really stick and be effective. i started wondering how other individuals and institutions manage not only to keep up with all of the new and potentially relevant tools that appear at an ever-increasing pace, but more so how they manage to determine which they will become expert at and use going forward. (although i was excited at what we were thinking of doing, i was quite sure that others were likely far ahead of us in this regard!) it made me realize that at some point i—and we—need to stop being bystanders to the assembly line, watching the endless parade of tools pass us by. we need to simply grab on to a tool and take it for a spin. if it works for what we need, we stick with it. if it doesn’t, we put it back on the line, and grab a different one. but at some point we have to take a chance and give something a shot. we’ve decided on a few methods we’ll try for taking full advantage of the tool-rich environment in which libraries exist today. our metadata team has set up a “test bench,” a workstation that we can all use and share for trying new tools. a colleague is going to organize monthly brown-bag talks at which team members can demonstrate tools that they’ve been working with and that they think have potential uses in our work. and we’re also thinking of starting an informal, and public, blog, where we can post, among other things, about new tools we’ve tried or are trying, what we’re finding works and how, and what doesn’t and why. we hope these and other initiatives will help us all stay abreast or even slightly ahead of new developments, be flexible in incorporating new tools into our workflows when it makes the most sense, and in building skills and expertise that benefit us and that can be shared with others. so, i ask you, our ital readers, how do you manage the assembly line of tools? how do you gather information on them, and when do you decide to take one off and give it a whirl? how do you decide when something is worth keeping, or when something isn’t quite the right fit and gets placed back on the line? why not let us know by posting on the italica blog? or, even better, why not write about your experience and submit it to ital? we’re always on the lookout for interesting and instructional stories on the tools of our trade! http://ital-ica.blogspot.com/ letter from the editors (december 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.16005 from the articles and communications in our december issue, the library technology profession has begun thinking through and reporting on the adaptations and changes wrought by the ongoing (some may say never-ending) covid-19 pandemic. four of the 5 articles in this issue relate to the many ways the pandemic altered how libraries do their work, both behind the scenes and in public. from the tools we use internally for project management to those we provide to our public service colleagues, it seems no aspect of library technology has been untouched. in particular, the seriousness of the challenges caused by interfaces with poor accessibility has been brought to the foreground. a critical component of libraries’ diversity, equity, inclusion, and accessiblity(deia) efforts, ensuring equitable access to all must be at top of mind. when the pandemic led libraries, and education in general, to adapt to largely virtual presentation models, the interactive tools we reached for—products such as padlet, jamboard, and poll everywhere— became de rigeur for establishing two-way interactions with our communities. yet little attention was paid, until now, to the accessibility of those tools. in this issue, “tech tools in pandemictransformed information literacy instruction: pushing for digital accessibility” provides excellent qualitative data to help us understand how well, or poorly, these tools meet accessibility needs. articles • digitization of libraries, archives, and museums in russia / heesop kim and nadezhda maltceva • tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility / amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson • spatiotemporal distribution change of online reference during the time of covid-19 / thomas gerrish and ningning nicole kong communications • a library website redesign in the time of covid: a chronological case study / erin rushton and bern mulligan • a library website migration: project planning in the midst of a pandemic / isabel vargas ochoa as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. we particularly welcome our public library colleagues to consider a column in our “public libraries leading the way” series; proposals for 2023 may be submitted through the pllw proposal form. with best wishes for 2023, kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/article/view/13783 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15097 https://ejournals.bc.edu/index.php/ital/article/view/15101 https://ejournals.bc.edu/index.php/ital/article/view/14801 https://ejournals.bc.edu/index.php/ital/call-for-submissions https://docs.google.com/forms/d/e/1faipqlsegdx926lhtfsrsdkexaqzmx1ayfw7g2ny6j1iegy-qt6lubq/viewform?usp=sf_link mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com articles communications balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository article balancing community and local needs releasing, maintaining, and rearchitecting the institutional repository daniel coughlin information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14073 daniel coughlin (dmc186@psu.edu) is head libraries strategic technologies, penn state university. © 2022. abstract this paper examines the decision points over the course of ten years of development of an institutional repository. specifically, the focus is on the impact and influence from the open-source community, the needs of the local institution, the role that team dynamics plays, and the chosen platform. frequently, the discussion revolves around the technology stack and its limitations and capabilities. inherently, any technology will have several features and limitations, and these are important in determining a solution that will work for your institution. however, the people running the system and developing the software, and their enthusiasm to continue work within the existing software environment in order to provide features for your campus and the larger open-source community, will play a bigger role than the technical platform. these lenses are analyzed through three points in time: the initial roll out of our institutional repository, our long-term running and maintenance, and eventual new development and why we made the decisions we made at each of those points in time. the institutional repository (ir) a university institutional repository (ir) provides long-term access to the scholarship and research outputs of an institution.1 the outputs can be in the form of scholarly publications, data sets to support publications or other research, electronic theses and dissertations, and other digital assets that have value to the university to preserve and to the research community an d beyond to disseminate. there is additional value in keeping these otherwise scattered resources collected in a single repository to showcase the scholarly accomplishments of an institution.2 there is value to the university to collect and disseminate the scholarly outputs of the university to understand the strengths of the university and promote that research to outside audiences, attract new faculty, and provide opportunities for new faculty where fields may be emergent or void of an institutional presence. furthermore, there is value to the research community to be able to find peer research without having to pay publisher access fees. reducing the burden on faculty to meet various policy demands from a federal, publisher, and institutional perspective provides another motivation for irs. federal policies can require making anonymized research data and scholarship publicly available because it is publicly funded through tax dollars; publishers can make authors provide access to the data that supports the research that is being published.3 in the united states, a growing number of academic institutions, from 2005 to 2021, have adopted an open-access policy that requires researchers to provide a copy of any published scholarly article in a publicly accessible repository. the institutional repository is a way for a university to meet this increasing demand from research organizations and funding institutions for their researchers.4 as the size of a campus grows in disciplines, it inherently grows in complexity and a diversity of digital needs and use cases from its researchers. for example, high-resolution images or mailto:dmc186@psu.edu information technology and libraries march 2022 balancing community and local needs | coughlin 2 atmospheric data are likely to create a higher demand in storage needs than a discipline that relies largely on text. performance-based research may require multimedia resources and streaming capabilities while other large files can be shared in a more asynchronous manner. the diversity of needs contributes to the complexity of finding a solution for an institutional repository that meets all, or many, of the needs on a campus from a file storage, discovery, and access perspective. this paper broadly addresses penn state university’s development of its ir at three distinct points in time: (1) choosing a platform for our ir and its initial release; (2) maintaining an ir; and finally, (3) our current solution nearly 10 years later. at each point in time, we analyze our decision process through four lenses. these lenses provided a thorough examination for us to decide on how to proceed; they are community needs and potential tension that exists with local needs, our team dynamics, and finally the platform we built our software on and the infrastructure required to maintain it. we discuss why we made the decisions we made through these four lenses, the benefits and drawbacks, and what we have learned along the way. penn state is the state of pennsylvania’s land grant university in the united states. the university has 24 campuses physically located in the commonwealth of pennsylvania, the world campus which is online, two law schools, and a medical school. in the fall of 2021, penn state had 73,476 enrolled undergraduate students and 13,873 graduate students, with research expenditures totaling over $850 million for the last four years.5 penn state is a large, public research institution with a diverse set of needs. this is significant because when the university is considering developing a large system such as an institutional repository, we need to meet the needs of a broad set of disciplines and domains. we are fortunate enough to have software developer and system administration resources that smaller institutions may not have. this provides a bit of context into our considerations for an institutional repository. selecting a repository in january 2012, penn state university libraries and penn state’s central information technology department collaborated on developing an institutional repository for the university’s growing data management needs. the university libraries was interested in becoming more involved in open-source software community development efforts. at that point, many universities that we had spoken with had an existing ir solution in place, and we had a lot of freedom to choose a platform without the burden of data migration. we considered investigating (1) off-the-shelf, turnkey solutions such as dspace, (2) a prototype we had just built called curation architecture prototype services (caps) using a microservices approach, or (3) building on top of an existing platform. ultimately, we decided to build on top of an existing platform, samvera (named hydra at the time).6 we did not want a turnkey solution, because we felt that we had distinct needs that would require a level of customization that these solutions would not be able to offer. based on discussions with others, we decided to develop something of our own. we wanted to leverage the experience of others in the repository development domain. the microservices approach at the time was more of a conceptual approach towards development than an existing software solution. the ability to build on an existing platform was a happy middle ground for us and we evaluated this decision through several lenses that led us to our selection at that time. community involvement we did not want to develop a solution in a vacuum and thought a group with a (relatively) common set of problems would be helpful to problem solve. the samvera community was a small but growing community working towards repository solutions like what we were trying to information technology and libraries march 2022 balancing community and local needs | coughlin 3 achieve. members of the community were both managerial and technical. this was valuable to us for understanding the strategic direction for the community and the ability to collaborate and problem solve on technical implementations. some of the key partners for our early work were university of hull (uk), stanford university, university of virginia, and notre dame. there was communication throughout the year over community email, chat platforms, and phone calls; however, the quarterly partner meetings were the most valuable time for collaboration. these quarterly meetings were a couple days in length, typically at a partner institution’s campus (physically) attended by managers and software/systems developers. this provided the ability to work together on specific problems, showcase our work, and get to know each other more closely at lunch and after-hours meetups. working within the community would also get our team increased exposure and help with recruiting future colleagues. working in the open-source software community has been seen to benefit both candidates and employers in future job recruitment.7 we were excited by the promise of working with and contributing toward a larger community. our team had apprehension about building this alone, and we were happy to be working with the support of a community and within their set of processes. local needs early on in our requirements for the repository we created a moscow chart that provided our “must have,” “should have,” “could have,” and “won’t have” features.8 the platform we were choosing was going to provide us with a significant set of these features for our repository with very little work on our end. these features were built in and included search, discovery, and basic file edit functionality. essentially, we were going to quickly meet the needs of our stakeholders by using this software. this was important for a couple of reasons. first, providing features to our stakeholders quickly gave them ample time to provide feedback so that we could make necessary customizations for their specific needs. a less quantitative benefit was gaining the trust of colleagues at the start of a new project and new initiative. rather than continually suggesting “that feature will be done next week,” we were able to deliver results quickly and get feedback. for example, our repository integrated with our campus authentication system, restricting access. we were able to deliver these features and get feedback on both the functionality as well as terminology to improve the usability. in particular, the way our developers described permissions was initially too confusing for our users and we were able to make necessary adjustments prior to a production release of the ir. team dynamics we believed it was a significant professional development opportunity for many people on the team to work with a larger community and learn from and with those in the open-source community. the team working on the ir consisted of three full-time, or near full-time, developers (one joining after we started the project), and a systems administrator. this project was our first large project that included a project manager invested in agile project management methodologies and with a systems administrator in place at the beginning of the project. platform and infrastructure stability there was a desire to get to a common solution to easily set up other repositories for various needs within the libraries and we hoped there would be an ability to plug and play various components or features. the three common components of this system were fedora commons to store both metadata and our digital assets; solr as an index for fast search; and blacklight as a web interface that sits on top of solr. one of the primary components, active-fedora, would sync content between the fedora and solr persistence layers. our hope was that with this model, we information technology and libraries march 2022 balancing community and local needs | coughlin 4 would be able to write code that could be used in other repositories, and we could use the code that other institutions had written for our repository needs to build other applications more quickly. the samvera community was initially called hydra because of the relationship with the mythical creature that has several heads (see figure 1). we were considering the potential of running a core storage infrastructure and discovery infrastructure, while developing several heads for our various applications. we knew this was a lofty expectation, but also thought that it was a good design principle for us to advance. additionally, the pilot that we developed on microservices (caps) seemed to have a relatively large storage service and we could not determine how to get away from that. although this was a bit of a shift in our philosophy, it was less of a shift based on our practical experience. figure 1. aspirational intentions of running many applications on one access and discovery system. initial release the initial release of our ir, scholarsphere, was for research data, scholarly articles, and presentations. we considered the repository file agnostic and left the definitions of scholarly materials up to the depositor. the self-deposit process made very few assumptions to limit the barriers to deposit—there were few mandated fields for deposit in scholarsphere. the initial rollout of scholarsphere had met the “must have” and many of the “should have” needs that we had defined initially in our development requirements. the list of “must haves” included upload files via the web, create and assign metadata to the uploaded files, set three access levels to the files, search for files, display files, etc. the list of “should haves” were faceted browse, faceted search results, share files with a group, etc. the benefit of working on a community-developed platform provided some of these features for us (search, faceted browse), and gave us the flexibility to customize where necessary. for example, we had our own data model of metadata to assign to files based on our users’ needs. we were able to update the existing metadata that was information technology and libraries march 2022 balancing community and local needs | coughlin 5 provided out of the box, to accommodate that. this was a tremendous win for us to leverage community-provided solutions and local needs. additionally, the platform provided a search index with solr. this enabled our infrastructure to have a common solution with community support on configuration questions. using the blacklight ui on top of solr created another opportunity for us to customize where desired and ease of development efforts. community: following the initial release, we worked with other members of the community to pull out some of the core functionality and place it in a separate ruby library. this library (sufia) could then be leveraged as a default set of repository features for other developers. the release of a new ir, and this library, provided us with a lot of positive exposure at various community events. local needs: locally, we used this library to develop a repository for our digital archives. it previously took two to three developers nearly nine months to develop scholarsphere; however, we used the sufia module to roll out a separate repository in six months with a single developer. this was another successful production rollout and a successful use of a product created by and for the community. team dynamics: we had a successful release and were getting support for new developers to hire. we continued to move more of our projects toward an agile approach and permanently embed systems administrators into our development projects. infrastructure: we had not released a new system for archives on the exact same system that scholarsphere was developed on, but we were happy that our projects were relatively homogeneous technology stacks and provided a familiarity to run. maintaining the ir over the next several years we released three major updates to scholarsphere: 1. migrating the data object store to a major version 2. overhauling the user interface 3. migrating our data model to the portland common data model (pcdm) simultaneously, the sufia library that we developed had also grown in usage by other institutions and contributions from other developers. we were excited to have additional contributors, and with that came an understandable sense of competing priorities within our community’s development roadmap. we were building scholarsphere features and functionality to meet the needs of our local institution and managing the tension between community direction and local needs. again, we look at these lenses as evaluating the period during maintenance, upgrades, and feature adds. community involvement two of the major releases mentioned above were largely community driven. in one case— migrating the data object store—we were one of the initial repositories within our open-source community to migrate our data storage system. we anticipated that doing this work early would prevent us from having to rewrite any code that relied on the data storage layer. ultimately, this may have been a bit early for us, because we never were able to create the momentum for others in the community to make this same migration. this created a bit of a divergence, but at this layer in our technology it did not prevent us from continuing to work closely with the community. information technology and libraries march 2022 balancing community and local needs | coughlin 6 we were able to add locally developed features for managing files and uploads, community components that allowed for controlled vocabularies, and cloud provider uploads.9 in all, from 2012 to 2019 we were an active member of the community: we provided technical contributions, we were being asked to present at community events, and our developers were frequently asked to help at several workshops. the community provided many opportunities for professional development and code from the community provided new features to our users. we felt this work was successful. we had three major releases. one was something that our local users were able to experience directly. two of our upgrades were largely on the back end and, while there is no argument on their importance, it can be a challenge to illustrate the significance of largely opaque technology upgrades to users. concurrently, we were coming up against other challenges that were proving difficult to solve in a sustainable and scalable way. large file size (larger than 1 gb) for uploads and downloads remained an issue that researchers seemed to be encountering more frequently. our mechanisms for getting around some of these obstacles led us to looking at an api for administrators and other applications to integrate. for example, if the web browser upload was not working, perhaps we could physically get the file from a user and upload it to the system ourselves. if we could do that, maybe we could use an api to do this upload, but we did not have an api. when developing new features, we would question if it should be code to contribute back to the community or only for our (penn state) needs. frequently, the devil is in the details and , while several institutions were interested in a feature based on a conversation, implementation could be much more detailed and it was difficult to find common ground. this complexity could lead to longer timelines and more difficult planning for local development features. team dynamics over this time period we advanced our team by adding several highly skilled developers (some of whom have now moved on to other positions and remain highly respected within the community), and enriched the collective skill set of the group. the team was enriched by this experience overall. the balance between community involvement and local needs became a frequent conversation point for our team. we spent a lot of effort on initiatives that had not solved some of the bigger problems our users were experiencing locally; our community disengagement was likely a combination of common reasons, for example, our lack of time to make meaningful contributions.10 in the spring of 2019, the development team that worked on scholarsphere shrunk from three developers down to one. we had a strong number of developers within the samvera community to collaborate with; however, we had difficulty bringing on new members at the time because the complexity within the scholarsphere system created a high learning curve that was not necessarily transferable to other technology stacks. at the end of the summer in 2019 we were given 25 gb of video files to upload in scholarsphere and make accessible. the parameters of the request were outside of what we could support from our web interface, and we had no api to allow a product owner to develop against and work with the researcher to meet this request. after approximately one month of working with the data and our system, we successfully ingested the files into scholarsphere. at the end of this month, we information technology and libraries march 2022 balancing community and local needs | coughlin 7 decided that we needed to more urgently evaluate our path forward because we could not have our lone developer spending this amount of time on single-user requests. platform and infrastructure stability each of the major versions released between 2012 and 2019 had several patches and feature releases to enhance the system, the interface, and/or our processes for change management within the software system. for example, we went from a typed script containing a series of commands to chef (a language used to automate software deployment) for deployment management; we upgraded infrastructure core components (fedora, solr, travis, redhat, etc.); and we added infrastructure to keep up with the system demands. in terms of adding infrastructure, we both enhanced the virtual capabilities (cpu and ram) of our systems and had tasks offloaded to other systems. we did not want the systems our users interfaced with to be responsible for all the heavy lifting. these tasks included characterization, indexing metadata for search, creating thumbnails, etc. (see figure 2). figure 2. systems and services with basic workflow process for uploading a file to scholarsphere, including the background jobs that ran on file upload. adding additional components improved the user experience but made our infrastructure d ifficult to manage. we were continually trying to push our systems to reflect the best practices of the twelve-factor app.11 however, over time, we had certain “infrastructure smells.” the infrastructure smells were essentially anti-patterns of these best practices or symptoms of a bigger problem.12 these anti-patterns included: • storage coupled closely to application • lack of flexibility to scale storage to integrate • inability to spin up a scholarsphere instance web 1 repo isilon jobs mysql services • apache • rails • passenger • clamav services • maria db services • tomcat • fedora • jetty • solr • redis services • rails • resque • fits services • nfs jobs on upload • characterization • thumbnail creation • text extraction (solr) • derivatives information technology and libraries march 2022 balancing community and local needs | coughlin 8 • taking days to set up a dev environment • lack of flexibility to decouple small tasks that may require increased resources (create derivatives) evaluating next steps although we were coming up against some struggles and continued maintenance with scholarsphere it was a successful software project that had several things we liked (and likely took for granted). it was important for us to recognize what features and characteristics of scholarsphere were a part of this list. scholarsphere‘s data model was flexible enough to support several current use cases and future needs and was developed with a significant amount of community input. there were other development teams within our organization that were also developing new applications in ruby, so the language continued to be relevant within ou r larger group as was ruby on rails, blacklight, and solr. some of the libraries developed with these frameworks were providing us with struggles and we knew that tools and infrastructure could be barriers to newcomers onboarding and orientation.13 however, the languages themselves were still flexible enough for us to continue our work. we had three permission levels to access the full text of an uploaded file: (1) public, (2) penn state only, and (3) private, and we didn’t want to develop anything more complex than that around access permissions. fedora provided us with versioning capability of our objects and we thought that this was something not only to continue but potentially enhance. we also had strong support from the samvera community for scholarsphere. many people had worked on the code that helped provide functionality and we could collaborate within that community when problems arose. at that point we largely decided to continue to develop needed features for scholarsphere while the community pushed forward. in part we were hoping that our divergent paths would converge within a year (give or take). the month following the relatively manual process of ingesting the 25 gb of video files into scholarsphere was spent making important updates to the system and fixing any low-hanging fruit. in october 2019 we decided to start from scratch and spend about two months developing a new solution and to evaluate our path forward after that. current solution we turned to the same four established lenses when evaluating our needs in 2019. however, it is worth noting that organizationally we were in a much different position than when we started in 2012. the software development and infrastructure team that managed the service was organizationally moved from central it to the libraries where the service and product owner resides. being in the same building and having the same priorities improved communications. also, people within the teams had changed, and our leadership had changed, which changed how we approached some of our decisions. we had more experience in technical skills, specifically in repository development; we were more refined in our implementation of agile methodologies; and having run a service for years, we had a better sense of our users’ needs. community involvement the community saw a tremendously successful period of growth during this time in adoption of software, exposure for funded grants, and number of partners. there was renewed excitement about multiple solutions including turnkey repository solutions, hosted solutions, the merging of two highly regarded software libraries for performance, and improvement in developer friendliness. the latter improvement stripped some of the design patterns that developers struggled with to something more familiar and made it easier to onboard new developers. information technology and libraries march 2022 balancing community and local needs | coughlin 9 local needs the pressure to meet our local needs and competing priorities for the community-based software became a sticking point for us. we needed to have a more scalable backend and we were not su re when our needs and the priorities of the community would merge. we had also been behind on several dependencies and the lift to get back up to date, before being able to add anything new, was considerable. this situation led us to create a prototype for evaluation. our initial goal was to see how difficult it would be to build a system to meet the needs of uploading the video files that scholarsphere currently could not handle. we had confidence we could develop features, but this area was a consistent challenge and we considered it a primary hurdle for us to jump. team dynamics our development team consisted of a single developer. however, we had an infrastructure developer who was able to help with systems configuration, automation, and containerization. our developer thoroughly understood scholarsphere and the underlying codebase and architecture and had the resources to hire a consultant to help with our efforts. we had considerable work performed by a local software development company on other repositories (electronic theses & dissertation system, a digital cultural heritage repository, and our researcher metadata database). we valued this partnership and wanted to continue to utilize them as our staff numbers were down. we needed to be able to more quickly onboard others than we previously had in the past. if we were able to have three relatively new members of the team contributing to this progress, then we would also potentially have chosen a technology stack that was comfortable for others outside our development team to make a more immediate impact. platform and infrastructure stability as with many systems that are actively developed for years, our current system had several dependencies that had organically grown over time to become burdensome to put together in order to set up a development environment. additionally, a local development environment was not an exact replica of the production environment because networked storage was implemented on production and our development systems had a local storage. we also took this opportunity to test out amazon s3 storage options as our production storage system. we chose this alternative to see if we had increased reliability in our storage and to see how well we could manage data in s3 and get a production service using this to provide an example of the annual operating cost by using the cloud vendor. we were able to simplify our rollout a bit, and modernize the technologies used to run our systems (i.e., docker containers, kubernetes cluster) (see figure 3). development we had three general goals: (1) to improve stability/scalability for local needs; (2) to improve our ability to get an environment up for developers more simply; and (3) to be able to onboard new developers more quickly. shortly after our prototype test proved we could meet local needs in scalability, we were able to test out our second goal, getting a scholarsphere environment set up easily. the process of setting up a development environment went from days to hours. we had reached two of our three goals with these tests and believed our development team (that was two to three new developers) contributing to our first two goals was proof that we could onboard new developers quickly (our third goal). after several months of development in early 2020 we had accomplished moving several of the obstacles that had been in our way in recent years but were nowhere near feature parody with scholarsphere. information technology and libraries march 2022 balancing community and local needs | coughlin 10 figure 3. current infrastructure for scholarsphere, released in november of 2021. we had a rich feature set to transfer from the existing scholarsphere and did not want to simultaneously run two systems until we achieved some level of feature parody. we wanted to get to a minimal viable product (mvp) for our new prototype, migrate data, release our new version, and retire our existing system. our product owner had been working directly with scholarsphere users and was able to help us determine priorities for the features we needed in order to have an mvp. the following were some of those features: • an api, at the very least an internal one for o our migration script o other home-developed applications o internal library employees • versioning and the ability to view versions • updated status (pending published) • updated user interface • urls that were harvestable • maintaining our data model for continued support of concepts such as collections • enhanced support for dois information technology and libraries march 2022 balancing community and local needs | coughlin 11 we also identified some features that had been developed over the years to either simplify or eliminate. the profiles within scholarsphere were not heavily used and over the years the university had more mature systems for this type of purpose. similarly, finding a featured researcher for the home page seemed to create more work than it was worth, and our social media integrations were not going to be a priority. we also thought a user’s dashboard—the default page after logging in—could be greatly simplified based on the most prominent actions our researchers wanted to perform. conclusion after a little over a year of development, in november 2020, we released our new version of scholarsphere. we used our own internal api, as planned, for data migration from our existing fedora commons storage system into the new one in amazon s3. over the past seven months we have done nine feature releases, including collections, and an enhanced api to support penn state’s open access initiative. we learned some lessons along the way within all of these lenses. we have also more than doubled the physical storage size of our repository since releasing in november 2020. over the summer, we were able to meet a faculty member’s request to upload 30 to 40 videos of 300 to 400 gb, a request we never would have been able to meet in our prior solution. community & local needs working with the samvera community has provided countless opportunities for our entire team. we were able to sharpen our technical skills, were given opportunities to lead workshops, organized community development sprints, and collaborated on a plan for a community roadmap (to name a few). our entire team benefitted in several ways by the involvement in the community: our software knowledge is higher, our problem-solving skills are more creative, and our outside professional opportunities expanded. ultimately, our paths diverged in a way that made it difficult to justify the time and resources required for merging back. there are several benefits to community-based software: more eyes looking at potential security issues in code, more voices to let you know when a dependency of your code has beco me vulnerable, shared software ideas for developing issues, and shared solutions for common problems. the cost of all these benefits comes with increased complexity in organizing a solution (you need to take multiple institutions into account), workflows for development (your local workflow may not be the same as the community approved workflow), competing priorities within the community, and competing priorities with the community and local roadmap. open source communities are largely online, these groups typically have a more shared, informal leadership structure and that lack of formal leadership can make it difficult to find solutions to these complexities.14 team dynamics, and platform and infrastructure stability rewriting a system can be a daunting task, and several prominent developers would argue against it.15 reasons we believe we were successful are that (1) we did not change our data model, (2) although we changed our architecture, we did not change our coding conventions or our agile development process, and (3) the benefits of our changes were multidimensional. we were meeting users’ needs with our development work and our infrastructure was enhancing our capabilities and making the work of our developers easier and less frustrating. our deployment process has improved to the point that we can perform a release easily and without downtime. information technology and libraries march 2022 balancing community and local needs | coughlin 12 our technology is no longer based on samvera, and is now, largely, a more generic ruby on rails application. we migrated from using fedora as both a metadata and object store (retrieving objects on our central isilon system through fedora) to using postgres as a metadata store and amazon’s s3 storage service for our files. we migrated our background jobs processing services from rescue to sidekiq. we continue to use blacklight discovery and search interface, with solr as our search platform. many of these technical decisions were made because of the change in dynamics of our team, and perhaps the single biggest change was around experience and the confidence that comes with that. selecting a platform and infrastructure to support that platform is daunting. it is particularly difficult when you have so many questions in front of you about how the system will be used, the demand it may be under, the need to scale, how to deploy new features and update dependencies, etc. our decisions in 2019 were made with much more experience and understanding of what was required of our system as well as what desired by our users. this gave us the confidence to branch off slightly from the joined technical path and recognize all the value (beyond technical solutions) to remain members of the community albeit in a modified capacity. acknowledgements many people put in tremendous time, effort, skill, thought, and enthusiasm into scholarsphere over the years. we want to acknowledge all those that have contributed to the development and advancement of the system and appreciation for their work: carolyn cole, hector correa, michael tribone, michael j. giarlo, adam wead, ryan schenk, jeff minnelli, dann bohn, justin patterson, joni barnoff, seth erickson, kieran etienne, calvin morooney, jim campbell, paul crum, chet swalina, matt zumwalt, justin coyne, elizabeth sadler, valerie maher, jamie little, brian maddy, kevin clair, patricia hswe, and beth hayes. endnotes 1 helen hockx‐yu, “digital preservation in the context of institutional repositories,” program 40, no. 3 (2006): 232–43, https://doi.org/10.1108/00330330610681312. 2 raymond okon, ebele leticia eleberi, and kanayo kizito uka, “a web based digital repository for scholarly publication,” journal of software engineering and applications 13, no. 4 (2020), https://doi.org/10.4236/jsea.2020.134005. 3 research data access and preservation, “browse data sharing requirements by federal agency,” sparc, september 29, 2020, http://researchsharing.sparcopen.org/compare?ids=18&compare=data; “publisher data availability policies index,” chorus, october 8, 2021, https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availabilitypolicies-index/. 4 “registry of open access repository mandates and policies,” roarmap, http://roarmap.eprints.org/view/country/840.html. 5 “student enrollment – fall 2021,” the pennsylvania state university data digest 2021, https://datadigest.psu.edu/student-enrollment/. https://doi.org/10.1108/00330330610681312 https://doi.org/10.4236/jsea.2020.134005 http://researchsharing.sparcopen.org/compare?ids=18&compare=data https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ http://roarmap.eprints.org/view/country/840.html https://datadigest.psu.edu/student-enrollment/ information technology and libraries march 2022 balancing community and local needs | coughlin 13 6 stephen abrams, john kunze, and david loy, “an emergent micro-services approach to digital curation infrastructure,” the international journal of digital curation 5, no. 1 (2010): 172–86, https://doi.org/10.2218/ijdc.v5i1.151. 7 jennifer marlow and laura dabbish, “activity traces and signals in software developer recruitment and hiring,” in cscw ’13: proceedings (acm, 2013): 145–56, https://doi.org/10.1145/2441776.2441794. 8 dai clegg and richard barker, case method fast-track: a rad approach (reading: addisonwesley, 1994). 9 “questioning authority,” github, accessed september 2021, https://github.com/samvera/questioning_authority; “browse-everything,” github, accessed 09/05/2021, https://github.com/samvera/browse-everything. 10 sophie huilian qiu et al., “going farther together: the impact of social capital on sustained participation in open source,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 688–99, https://doi.org/10.1109/icse.2019.00078. 11 adam wiggins, “the twelve-factor app,” accessed september 2021, http://12factor.net. 12 akond rahman, chris parnin, and laurie williams, “the seven sins: security smells in infrastructure as code scripts,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 164–75, https://doi.org/10.1109/icse.2019.00033. 13 christopher mendez et al., “open source barriers to entry, revisited: a sociotechnical perspective,” in proceedings of the 40th international conference on software engineering (may 2018): 1004–15, https://doi.org/10.1145/3180155.3180241. 14 lindsay larson and leslie a. dechurch, “leading teams in the digital age: four perspectives on technology and what they mean for leading teams,” leadership quarterly 31, no. 1 (2020), https://doi.org/10.1016/j.leaqua.2019.101377. 15 fredrick p. brooks jr., the mythical man-month: essays on software engineering (reading, mass.: addison-wesley pub. co., 1982) https://search.library.wisc.edu/catalog/999550146602121; joel spolsky, “things you should never do part i,” joel on software, april 6, 2000, https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/. https://doi.org/10.2218/ijdc.v5i1.151 https://doi.org/10.1145/2441776.2441794 https://github.com/samvera/questioning_authority https://github.com/samvera/browse-everything https://doi.org/10.1109/icse.2019.00078 http://12factor.net/ https://doi.org/10.1109/icse.2019.00033 https://doi.org/10.1145/3180155.3180241 https://doi.org/10.1016/j.leaqua.2019.101377 https://search.library.wisc.edu/catalog/999550146602121 https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ abstract the institutional repository (ir) selecting a repository community involvement local needs team dynamics platform and infrastructure stability initial release maintaining the ir community involvement team dynamics platform and infrastructure stability evaluating next steps current solution community involvement local needs team dynamics platform and infrastructure stability development conclusion community & local needs team dynamics, and platform and infrastructure stability acknowledgements endnotes student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion editorial board thoughts: ital 2.0 | boze 57 litablog.org/) i see that there are occasional posts, but there are rarely comments and little in the way of real discussion. it seems to be oriented toward announcements, so perhaps it’s not a good comparison with italica. some ala groups are using wordpress for their blogs, a few with user comments, but mostly without much apparent traffic (for example, the ll&m online blog, http://www .lama.ala.orgllandm). in general, blogs don’t seem to be a satisfactory platform for discussion. wikis aren’t particularly useful in this regard, either, so i think that rules out the lita wiki (http://wikis.ala.org/lita/index.php/ main_page). i’ve looked at ala connect (http://connect. ala.org/), which has a variety of web 2.0 features, so it might be a good home for italica. we could also use a mailing list, either one that already exists, such as lita-l, or a new one. the one advantage e-mail has is that it is delivered to the reader, so one doesn’t have to remember to visit a website. we already have rss feeds for the italica blog, so maybe that works well enough as a notification for those who subscribe to them. i’ve also wondered whether a discussion forum (aka message board) would be useful. i frequent a few software-related forums, and i find them conducive to discussion. they have a degree of flexibility lacking in other platforms. it’s easy for any participant to start up a new topic rather than limiting discussion only to topics posted by the owner, as is usually the case with blogs. frankly i’d like to encourage discussion on topics beyond only the articles published in ital. for example, we used to have columns devoted to book and software reviews. even though they were discontinued, those could still be interesting topics for discussion between ital readers. in writing this, my hope is to get feedback from you, the reader, about what ital and italica could be doing for you. how can we use ala connect in ways that would be useful? could we use other platforms to do things beyond simply discussing articles that appear in the print edition? what social web technologies do you use, and how could we apply them to ital? after you read this, i hope you’ll join us at italica for a discussion. let us know what you think. editor’s note: andy serves on the ital editorial board and as the ital website manager. he earns our gratitude every quarter with his timely and professional work to post the new issue online. t he title of this recurring column is “editorial board thoughts,” so as i sit here in the middle of february, what am i thinking about? as i trudge to work each day through the snow and ice, i think about what a nuisance it is to have a broken foot (i broke the fifth metatarsal of my left foot at the midwinter meeting in boston—not recommended) but most recently i’ve been thinking about ital. the march issue is due to be mailed in a couple of weeks, and i got the digital files a week or so ago. in a few days i’ll have to start separating the pdf into individual articles, and then i’ll start up my web editor to turn the rtf files for each article into nicely formatted html. all of this gets fed into ala’s content management system, where you can view it online by pointing your web browser to http://www.lita.org/ala/mgrps/divs/lita/ ital/italinformation.cfm. in case you didn’t realize it, the full text of each issue of ital is there, going back to early 2004. selected full-text articles are available from earlier issues going back to 2001. the site is in need of a face lift, but we expect to work on that in the near future. starting with the september 2008 issue of ital we launched italica, the ital blog at http://ital-ica .blogspot.com/, as a pilot. italica was conceived as a forum for readers, authors, and editors of ital to discuss each issue. for a year and a half we’ve been open for reader feedback, and our authors have been posting to the blog and responding to reader comments. what’s your opinion of italica? is it useful? what could we be doing to enhance its usefulness? in reality we haven’t had a great deal of communication via the blog. we are looking at moving italica from blogger to a platform more integrated with existing ala or lita services. is a blog format the best way to encourage discussion? when i look at the lita blog (http:// andy boze (boze.1@nd.edu) is head, desktop computing and network services, university of notre dame hesburgh libraries, notre dame, indiana. andy bozeeditorial board thoughts: ital 2.0 supporting faculty’s instructional video creation needs for remote teaching: a case study on implementing eglass technology in a library multimedia studio space article supporting faculty’s instructional video creation needs for remote teaching a case study on implementing eglass technology in a library multimedia studio space hanwen dong information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15201 hanwen dong (hanwendong@uidaho.edu) is instructional technology librarian, university of idaho. © 2023. abstract in 2021, alongside seven colleges at the university of idaho campus, the university of idaho library received an eglass system (https://eglass.io) with funding from the governor’s emergency education relief grant to expand faculty’s capacity to create instructional videos. the eglass is a transparent glass whiteboard that allows instructors to write, draw, and annotate. it comes with a built-in camera that can capture instructors’ facial expressions and gestures while facing their remote students and allow better engagement. the eglass is suitable for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. this article details the eglass equipment setup, studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned during the first year of the eglass deployment, and future considerations. introduction in 2021, the university of idaho library (library) received a transparent glass whiteboard called the eglass for faculty to record video-based lectures. the eglass was based on a similar glass whiteboard technology, called the lightboard, that the library already owned. initially built by university of idaho engineering students and later gifted to the library, the lightboard presented challenges to library staff as properly supporting the technology required spending a significant amount of time. offering similar functionalities, the eglass had the potential to also address the issues that the lightboard presented. similar to the lightboard, the eglass allowed instructors to write and draw on the glass while facing their audience, typically students who would be watching the recorded videos later, to provide better engagement. the eglass could also be used for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. to implement the eglass, it was necessary to consider factors such as the functionality, the space to be occupied, and faculty interest. a year after the original deployment of this tool, the author reports on the lessons learned in this article. lessons including the eglass equipment setup, multimedia studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned, and future considerations are explored later in this article. background the studio in the university of idaho library provides space and audiovisual equipment to students, faculty, and staff to pursue curricular, personal, and creative multimedia projects. mailto:hanwendong@uidaho.edu https://eglass.io/ https://eglass.io/ information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 2 dong originally converted from a 200-square-foot meeting room, the studio is equipped with a 27-inch imac, a 32-inch full-hd victek monitor, a scarlett 18i20 audio interface, a dbx 266xs 2-channel compressor/gate, two krk’s rokit 5 g3 powered studio monitors, two shure sm58 dynamic vocal microphones with microphone arm stands and pop filters, several portable lights, a green screen, and more. software installed on the imac includes audacity, camtasia, and the essential adobe creative cloud applications such as photoshop, premiere pro, indesign, etc. patrons can use the studio software and equipment to record voice-over narrations and podcasts as well as to edit multitrack audio clips and videos. in addition to using the studio equipment, patrons can also borrow other multimedia equipment, such as video camcorders, audio recorders, tripods, a usb microphone, and a dslr camera, at the circulation desk. initially managed by two library support staff, both of whom left the organization to pursue other opportunities, the studio operations were taken over by the author in 2020. due to the covid-19 pandemic and the lack of air ventilation in the space, the studio was closed in march 2020 and did not reopen until august 2021. while any university-affiliated patron is welcome to use the studio, first-time users were expected to complete an orientation with the author to become familiar with the equipment setup and the audio workflow. to use the studio, patrons had to make reservations, up to two weeks in advance, for up to two hours per day. reservations were made from the studio’s webpage and managed through springshare’s libcal product. patrons who frequented the studio pursued various personal, creative, instructional, and curriculum-related projects, including video recording with the green screen, video editing, podcast recording, voice-over narration recording, etc. the studio was used by patrons several times a week. according to the libcal space statistics, in fall semester 2021, the studio had 48 unique users, 147 total bookings, 211 hours booked, and the average reserved time block was 86 minutes. in spring semester 2022, the studio had 30 unique users, 64 total bookings, 103 hours booked, and the average reserved time block was 97 minutes. a noticeable usage drop in the spring semester was likely due to a reduced number of advertised studio orientations provided to the campus community and fewer classroom assignments that required or promoted studio use. for several years, the studio was home to a lightboard for faculty to record class lectures. designed as open-source hardware by dr. michael peshkin from the mccormick school of engineering at northwestern university, the lightboard was a transparent glass whiteboard illuminated with a built-in light, and the ink would glow in low-light environments. instructors could write and draw on the glass with neon markers while facing the viewers, and the writings and drawings along with the instructor could all be captured in the same frame using a separate camera.1 dr. peshkin provided two solutions for those who were interested in acquiring a lightboard : buying a commercially-produced one or building one from scratch. the lightboard in the studio was built by a group of students in a mechanical engineering class for a senior capstone project as part of a design challenge in partnership with the center for excellence in teaching and learning (cetl), and the students later gifted the lightboard to the library. the lightboard that the studio received came with a steel frame and wheels. the unit’s overall dimensions were 75 inches long, 45 inches wide, and 78 inches high. the glass board itself measured 71.5 by 47.5 inches (see figure 1). information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 3 dong figure 1. the lightboard that the library received. the lightboard was used by a few instructors who frequented the studio over the years. during fall 2019, one faculty member from the college of natural resources regularly used the lightboard two to three times per week for about 45 minutes to an hour per session. another engineering faculty member, whose students built the lightboard, also used the lightboard several times but did not have a regularly scheduled appointment. there had not been any regular users since then. recording videos using the lightboard required a complicated setup. first, instructors would need to gather several pieces of equipment. for instance, they would need to check out a video camera and a tripod at the circulation desk downstairs and a lavalier microphone at the room adjacent to the studio. the setup required the lightboard to be positioned between the instructor and the camera. it was necessary to change the camera setting to flip the video horizontally; otherwise, any writings or drawings in the final recording would be displayed backward. additional steps included starting and stopping the camera recordings, checking throughout the recording process to ensure the instructor’s writing on the lightboard stayed within the camera’s frame of capture, and transferring the media from the camera’s sd card to an external hard drive or to cloud storage. as a result, recording a session using the lightboard required assistance from at least one other individual, usually a library staff or faculty member, from start to finish. the many different information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 4 dong moving parts made the whole experience time-consuming and labor-intensive both for the library staff and the lightboard users. literature review lightboard technology has been implemented at various higher education institutions since 2014. thanks to dr. peshkin, who made the lightboard an open-source technology and provided the building instructions on his website, many institutions built their own versions of lightboards with variable setups. due to the nature of the lightboard requiring a controlled lighting environment and the writing being backwards from the perspective of those facing the glass (including the camera), the lightboards were used almost exclusively in dedicated studio spaces where the videos were to be recorded. for instance, similar to the university of idaho library studio setup, the complete setup at the university of western australia consists of a lightboard, a camera, lights, markers, a lapel microphone, and a black canvas.2 a budget setup that cost as little as $100 as a removable, tabletop version was also developed.3 cornell university came up with a lightboard and projector setup that can be used in a live 500-person auditorium.4 needless to say, the lightboard technology was adaptable enough to meet various needs on many campuses. several studies show that, among the various types of instructional videos for asynchronous learning, students favor lightboard videos. one unique feature of the lightboard technology, for example, is that it enables instructors to incorporate their gaze and gestures into the instruction. according to a 2015 study, combining gaze and gestures with traditional instructional materials proved to be more effective in directing students’ attention.5 in a 2019 study, several researchers analyzed various lightboard cases in the context of learning theories and theoretical frameworks, such as cognitive load theory, cognitive theory of multimedia learning, and social learning theory. the researchers concluded that while more empirical research was needed, the lightboard videos could improve student learning and engagement.6 in another study conducted by researchers at the university of illinois urbana-champaign, students watched two types of recorded lectures—picture-in-picture with the instructor appearing in a corner of the video, or an overlay of the instructor without the background. study results showed that the overlay videos where the instructor interacted with the content had more views and were preferred by the students, likely thanks to the gaze and gestures of the instructor increasing accessibility. 7 in classes in which the instructors opted to use the lightboard, students generally responded positively to the lightboard videos. for example, in two online classes at clayton state university, most students preferred the lightboard lecture over the traditional narrated powerpoint lecture, and “students described it as engaging, more personable, appealing to visual learners, easier to follow and retain the information, and more similar to a conventional live lecture.”8 at bond university, in queensland, australia, in a chemistry class where the lightboard videos were incorporated as a learning aid, researchers reported that over a four-year period, students scored higher on exams in courses in which lightboard videos were incorporated as instructional materials.9 in another example, students enrolled in a physics class at san diego state university were exposed to the learning glass, a commercial product that was based on the lightboard technology. students responded in a post-course assessment that they felt more connected to their instructor when the instructor utilized the learning glass, and thus the researchers argued that the learning glass could positively impact stem students’ retention rates.10 lastly, at georgia southern university, two researchers conducted a mixed-method study to assess different groups of students’ perceptions of lightboard videos. the findings showed that while performing equally information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 5 dong well when comparing test scores, the students in the class that incorporated lightboard videos had better understanding, engagement, and satisfaction based on the assessment measures.11 lightboards are not without their drawbacks given the requirements and the limitations of the equipment and the recording conditions. in an engineering class where students used lightboard for a problem-solving assignment to demonstrate their learning, researchers identified the various requirements including a room with sufficient size, the need for filming equipment, and long postproduction processing time.12 other disadvantages of the lightboard included immobility, limited writing surface, and a more rigorous cleaning process.13 the type of content being presented in lightboard videos also required consideration. in a study comparing different types of lecture videos, students showed a strong preference to the learning glass videos and “suggested that this style be used to supplement lecture videos (in the form of practice problems and follow-up videos).”14 this conclusion corroborated another study that a lightboard was useful for step-by-step problem-solving explanations.15 lastly, in a study that examined three different styles of lightboard videos (interview style, multipresenter, and multimedia-enriched), the researchers identified the benefits along with the drawbacks of each style.16 for example, while interview videos highlighted interactions between the presenter and the interviewer, the presenter experienced “difficulty in multitasking between writing notes on the lightboard and attending to the interviewer’s questions.” having several presenters could also limit the amount of space for them to move around and write on the glass while remaining in frame and created possible distractions of having too many people as well as too much writing on the glass. another potential issue is that not all presenters could be wearing darker-colored clothing for better contrast with the writing. eglass context in spring 2021, the manager at the collaboration & classroom technology services (ccts) department at the university of idaho informed the author that they were planning on purchasing several eglass units for the campus to support faculty’s instructional video creation. the funds came from the governor’s emergency education relief (geer) grant to address the covid-19 pandemic’s impact on higher education. initially, the grant was written by several individuals who intended to purchase commercially-made lightboards to enhance distance teaching options. while researching for the grant, the team stumbled upon the eglass, which seemed to be easier to use than the lightboard. the pricing was reasonable, so the team decided to purchase several of these devices instead of the original two lightboards that were originally recommended. if interested, the library could receive one unit alongside eight other colleges on campus. the author checked out the demo unit at ccts and reported the first impressions as a user to the dean of university of idaho libraries. the latter reasoned that due to the lightboard and eglass’s duplicating functionalities and the fact that the eglass had more perceived ease of use given its all in-one package without the lighting and camera being separate, it would be best to replace the lightboard with the eglass. the author contacted the lightboard capstone project faculty member, who chose to rehome the lightboard to the engineering outreach department at the college of engineering. removing the lightboard paved the way for welcoming the eglass to the studio by reclaiming needed room space. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 6 dong the eglass came in two sizes—a 35-inch and a 50-inch diagonal writing surface. the library received a 50-inch unit with the writing surface measured at 45.64 inches long and 27.40 inches tall. the height of the overall unit could be adjusted to 29.37 inches, 31.33 inches, or 33.31 inches. additional accessories that the library received included a desktop computer, two heightadjustable desks, a touchscreen monitor, a webcam, a ring light, peripherals, neon pens, and white clothes for wiping down the writings. once the order of the eglass came through, a ccts team that consisted of several individuals brought the eglass along with two height-adjustable tables to assemble (see figure 2). the assembling of all the equipment took about an hour. figure 2. ccts team assembling the eglass; disclosure: the shirt logo does not represent any affiliations. description similar to the lightboard, the eglass was made of a sheet of glass and a frame, and the instructors could write on the glass using neon markers. however, the eglass had several distinct features and advantages over the lightboard. first and foremost, the eglass had a built-in camera and the recording function that enabled the instructors to start, pause, and stop the recording on their own with a touch of a button. in addition, the eglass internal system flipped the image automatically in real time so that instructors did not need to write backward. therefore, using the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 7 dong eglass would not require additional support from library personnel since the separate camera setup was no longer needed. the eglass’s built-in lights were also an improvement over the lightboard’s lights. the lightboard came with one set of lights on the frame that illuminated the writings on the glass, but it was necessary to set up additional portable lights to ensure the instructors were illuminated as well. the eglass came with two sets of lights—the instructor light illuminated the instructor, and the blue glass lights ensured the ink on the glass would glow for better visibility. each set of lights was controlled by a separate knob to adjust the intensity. moreover, the eglass could be used as a standalone unit for simple tasks that involved writing and drawing on the glass. for example, instructors could start, pause, and stop the recording using the touch buttons located below the writing surface on the frame. instructors could also use the freeto-download eglassfusion software to access additional features, such as taking snapshots; importing powerpoint slides, word documents, pdfs, and other types of media files; removing the imported media’s background color; zooming in and out; and annotating by typing texts and drawing rectangles or arrows. figure 3. a faculty member recording a video with an application overlay. while the eglass was connected to a desktop computer via a usb cable, instructors could bring their own devices to connect to the eglass, which supports windows, macos, and chromebook operating systems. with a laptop connected to the eglass, instructors could use the down loaded and installed eglassfusion software to control what they were sharing on their screens. for instance, on their devices, instructors could use video conference software such as zoom and microsoft teams for synchronous online instruction via screen sharing and could switch from their laptops’ camera to the eglass camera as the output video. in addition, students could see the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 8 dong writings and drawings on the glass, the instructor’s face, body, gestures, and any programs opened on the instructor’s laptop on the same screen (see example in figure 3). lastly, instructors could choose to use the eglass while sitting down or standing up as the eglass was placed on a height-adjustable desk. the desktop computer, touchscreen monitor, webcam, and ring light enabled a one-button studio setup. instructors could open any video recording software when pressing the button to start a recording and use the touchscreen monitor for zoom whiteboard and camtasia for screencast recording with annotating. outreach the new equipment setup was completed a few weeks before the start of fall semester 2021. ccts sent out an announcement to the university daily newsletter targeted to faculty and staff to advertise that the eglass had been set up at various locations on campus. the author also provided 20 in-person studio orientations sessions, scheduled at 10:00 a.m. and 2:00 p.m. monday–friday during the first two weeks of classes, to campus students, faculty, and staff. prior sign-ups were not necessary, so patrons could simply show up at the orientation time. these orientations provided an overview to patrons unfamiliar with the studio or any pieces of the existing or new equipment. among the 36 patrons who showed up to the orientations, three faculty members were introduced to the eglass and one-button studio. several additional informational and educational workshops were conducted to promote awareness of the eglass. in the fall semester, ccts hosted a workshop introducing the eglass. due to the limited physical space in the studio that could only comfortably accommodate less than five people, the workshop was hosted in a hybrid format with the in-person location in a room adjacent to the studio. participants could choose to attend either via zoom or in person. if attending in person, participants could visit the studio after the workshop to check out the eglass setup and try out the equipment. workshop attendees noticed that the writing on the eglass was difficult to differentiate from the white wall, which served as the background. after the workshop, the author ordered some black wallpaper and applied it to the wall facing the eglass to help improve the contrast. in the 2022 spring semester, the author facilitated an online library workshop to introduce the eglass, its core features, advantages over the traditional white/blackboard or zoom instructions, examples of applicable disciplines to use eglass for instruction, and best practices to five faculty and two staff attendees. another event to promote the eglass was the engineering design expo at the university of idaho college of engineering, an annual event that showcases design projects created by students. this event attracted regional k–12 students, community college students, industry partners, and community partners. the makerfaire, an event that featured makerspace technologies and a drone demonstration, took place on the same day as the expo. due to the perceived impact of eglass and its application to stem instructions, marketing eglass to the stem audience seemed to be a natural fit. thanks to the assembling ease, the author staffed a table at the makerfaire with a smaller eglass unit loaned from another campus location. the author demoed the eglass to passersby, including students, faculty, and community members. lastly, the active learning symposium is an annual event hosted by ccts and cetl at the university of idaho. in 50-minute presentations, instructors shared their teaching strategies to promote active learning in their classrooms. the author reached out to one eglass regular user, information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 9 dong the computer science department chair, to co-present at the symposium to introduce the eglass and showcase some eglass videos created for a computer science class. usage in the 2021–2022 academic year, two faculty members regularly reserved the studio to use the eglass. one faculty member was the chair of the computer science department, and the other person was in the animal, veterinary and food sciences department. after attending an orientation to the equipment, setup, and software, the faculty members reserved the space and recorded on their own a few more times without the need for support from the author or a staff member. one of the initial goals of replacing the lightboard with the eglass was to free up library staff time to support faculty recording lectures, and the author believed that having this new equipment reached this goal. about halfway through the fall semester in 2021, the author added a checkbox for patrons to indicate their intended studio usage when making a reservation on the library website. based on the statistics generated by libcal, in addition to the two faculty members, five students booked the studio to create instructional videos. however, since none of the students reached out to the author directly and the studio was not staffed, it was not possible to confirm if the students used the eglass or any other pieces of equipment in the studio for video creation. regardless, the overall usage of eglass was lower than anticipated, and the author believed that there were several contributing factors. first, the equipment was not properly set up until the end of summer. several faculty who heard of the eglass expressed interest in using it to prepare for fall instruction, but shipping delays prevented the equipment from being delivered and set up in time. moreover, since several other colleges also received the eglass, faculty members who could access a unit at their colleges chose not to check out the library studio location despite the additional equipment and the optimized space to help improve the user experience. lastly, despite the marketing efforts, the author suspected that the majority of campus was still not aware of the existence of the eglass technology, so additional outreach was probably still needed. lessons learned after overseeing the studio with the new eglass equipment for two semesters, the author underestimated the amount of work to promote the eglass—the saying that “if you build it, they will come” does not always ring true. ensuring that the eglass was adopted by more faculty members required a lot of dedicated effort. identifying several early adopters who saw the value of the technology and were willing to advocate for it by spreading the word to their colleagues was key. even then, the author noticed that the two faculty members who had been using the eglass had stopped coming to the studio regularly after several sessions. keeping faculty engaged despite their diminishing interest in using the equipment was an issue that the author did not anticipate or resolve. in the 2022–2023 academic year, the library engaged in an organization-wide reorganization that halted several existing and anticipated work priorities, one of which was conducting studio space and service assessments. in the 2023–2024 academic year, through a collaborative effort with the new department administrator, the author hopes to improve the studio and eglass usage by planning promotional initiatives and resuming assessment activities. the space to place the equipment, on the other hand, was another consideration. while it was decided to put the eglass in the studio so that the lightboard could be replaced, the physical unit of the 50-inch eglass took more space than the original lightboard. occasionally, the author received information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 10 dong requests from patrons who wanted to use the studio to record videos using a green screen. while it was still manageable to set up a green screen in the remaining space, the lack of room made patrons’ recording experience feel cramped and awkward. overall, for a 200-square-foot studio that had a computer desks, audio equipment racks, portable lights, housing the eglass was less ideal than anticipated. moreover, in order for the studio to be optimized for using the eglass, the lighting, sound, and background required permanent adjustments. for example, after the initial setup, the eglass was facing a white wall in the studio. ideally, the background needs to be dark to help contrasting the lighter neon color writings on the glass. possible solutions included installing a black backdrop, painting the wall black, or applying black wallpaper. installing a backdrop with curtains was the most expensive and time-consuming option, and painting the wall would require temporary closure of the studio. the author opted to order black wallpaper from amazon.com to minimize the disruption to studio operations during the regular semester. the wallpaper cost less than a hundred dollars and applying it to the wall only required an hour, but eventually the adhesive started to wear off. the author decided to remove the wallpaper over the summer and contacted the facilities department to paint the wall black, which took time for removing and restoring the equipment in addition to the time for the wall to dry. lighting was another challenge since the eglass required a light-controlled environment. ideally, all the lights in the room should be turned off for patrons who wanted to use the eglass so that the writings and drawings on the glass were highly visible. some fluorescent lights in the studio were emergency lights that could not be turned off by flipping the light switches. the author had to manually disable some of the lights for the eglass users. the last space-related challenge was sound. the eglass came with a built-in microphone that did not require a separate microphone setup. however, the eglass was placed close to the walls in the studio due to a lack of space which caused some reverberations, lowering the overall sound quality. the sound could be improved if patrons used a headset with microphone and connected the headset to the computer dedicated to the eglass. installing acoustic wall panels was another viable option, and the author might consider such an approach if the usage of the eglass grew to justify the equipment purchase. conclusion the eglass technology at the university of idaho library offered an improved instructional video creation experience to the campus community. thanks to the eglass’s easier setup compared to the lightboard and the studio space improvement in terms of the controlled lighting and the black wall, faculty were greatly benefitted from having access to a tool that enabled them to create engaging videos for classes delivered in online and hybrid modalities. however, additional dedicated outreach efforts are needed for a wider campus adoption. at the university of idaho, seven other colleges on campus owned eglass alongside the library, and there has not been any coordinated communications to promote the technology among all locations. while marketing emails and newsletters would work well for most new services, it is the author’s opinion that potential users would better understand the applicability of the eglass to their instruction when they are able to see the physical unit in person. more in-person outreach, such as inviting faculty to the studio or attending departmental faculty meetings to show videos made using eglass, would be of help. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 11 dong for other institutions that might be interested in acquiring an eglass or a similar technology, the author would suggest conducting an environment scan first to determine the campus need. are there faculty on campus who could benefit from this type of technology to achieve their instructional goals? are there any existing spaces on campus that offer comparable services or resources? if the library administration was interested in acquiring the technology for the library, is there an existing space that would be suitable for placing the equipment? would the library invest in the room so that the lights could be fully controlled, sounds could be proofed or dampened, and a background could be darkened? would there be a staff member to be assigned as the dedicated person to support and maintain the technology? the author hopes that this case study presents a myriad of ideas for those considering adopting a technology similar to an eglass at their libraries. endnotes 1 michael peshkin, “lightboard.info,” https://www.lightboard.info/. 2 timothy r. corkish et al., “a how-to guide for making online pre-laboratory lightboard videos,” in advances in online chemistry education, acs symposium series, vol. 1389 (washington, dc: american chemical society, 2021), 77–91, https://doi.org/10.1021/bk2021-1389.ch006. 3 katrina hay and zachary wiren, “do-it-yourself low-cost desktop lightboard for engaging flipped learning videos,” the physics teacher 57, no. 8 (november 1, 2019): 523–25, https://doi.org/10.1119/1.5131115. 4 erik s. skibinski et al., “a blackboard for the 21st century: an inexpensive light board projection system for classroom use,” journal of chemical education 92, no. 10 (october 13, 2015): 1754– 56, https://doi.org/10.1021/acs.jchemed.5b00155. 5 kim ouwehand, tamara van gog, and fred paas, “designing effective video-based modeling examples using gaze and gesture cues,” journal of educational technology & society 18, no. 4 (2015): 78–88. 6 mark lubrick, george zhou, and jingsheng zhang, “is the future bright? the potential of lightboard videos for student achievement and engagement in learning,” eurasia journal of mathematics, science and technology education 15, no. 8 (april 11, 2019): em1735, https://doi.org/10.29333/ejmste/108437. 7 suma bhat, phakpoom chinprutthiwong, and michelle perry, “seeing the instructor in two video styles: preferences and patterns” (paper, international conference on educational data mining, madrid, spain, june 26–29, 2015), https://eric.ed.gov/?id=ed560520. 8 sheryne southard and karen young, “an exploration of online students’ impressions of contextualization, segmentation, and incorporation of light board lectures in multimedia instructional content,” the journal of public and professional sociology 10, no. 1 (january 5, 2018), https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7. https://www.lightboard.info/ https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1119/1.5131115 https://doi.org/10.1021/acs.jchemed.5b00155 https://doi.org/10.29333/ejmste/108437 https://eric.ed.gov/?id=ed560520 https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7 information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 12 dong 9 stephanie s. schweiker and stephan m. levonis, “a quick guide to producing a virtual chemistry course for online education,” future medicinal chemistry 12, no. 14 (july 1, 2020): 1289–91, https://doi.org/10.4155/fmc-2020-0103. 10 shawn firouzian, chris rasmussen, and matthew anderson, “adaptations of learning glass solutions in undergraduate stem education,” in proceedings of the 19th annual conference on research in undergraduate mathematics education, (pittsburgh, pennsylvania: special interest group of the mathematical association of america on research in undergraduate mathematics education, 2016), 751–60, http://sigmaa.maa.org/rume/rume19v3.pdf. 11 peter d. rogers and diana t. botnaru, “shedding light on student learning through the use of lightboard videos,” international journal for the scholarship of teaching and learning 13, no. 3 (2019), https://eric.ed.gov/?id=ej1235871. 12 kenneth r. hite et al., “effects of lightboard usage on circuit problem skills,” in 2017 ieee frontiers in education conference (fie) proceedings, (ieee, 2017), 1–4, https://doi.org/10.1109/fie.2017.8190529. 13 weibing ye, “lightboard and chinese language instruction,” journal of technology and chinese language teaching 7, no. 2 (december 31, 2016): 97–112. 14 ronny c. choe et al., “student satisfaction and learning outcomes in asynchronous online lecture videos,” cbe—life sciences education 18, no. 4 (december 2019): ar55, https://doi.org/10.1187/cbe.18-08-0171. 15 julia vandermolen, kristen vu, and justin melick, “use of lightboard video technology to address medical dosimetry concepts: field notes,” current issues in emerging elearning 4, no. 1 (june 13, 2018), https://scholarworks.umb.edu/ciee/vol4/iss1/6. 16 christoph dominik zimmermann et al., “utilizing the power of blended learning through varied presentation styles of lightboard videos,” in technology-enabled blended learning experiences for chemistry education and outreach, ed. fun man fung and christoph zimmermann (elsevier, inc., 2021), 31–40, https://doi.org/10.1016/b978-0-12-822879-1.00003-2. https://doi.org/10.4155/fmc-2020-0103 http://sigmaa.maa.org/rume/rume19v3.pdf https://eric.ed.gov/?id=ej1235871 https://doi.org/10.1109/fie.2017.8190529 https://doi.org/10.1187/cbe.18-08-0171 https://scholarworks.umb.edu/ciee/vol4/iss1/6 https://doi.org/10.1016/b978-0-12-822879-1.00003-2 abstract introduction background literature review eglass context description outreach usage lessons learned conclusion endnotes 10181 20190318 galley a systematic approach towards web preservation muzammil khan and arif ur rahman information technology and libraries | march 2019 71 muzammil khan (muzammilkhan86@gmail.com) assistant professor, department of computer and software technology, university of swat. arif ur rahman (badwanpk@gmail.com) assistant professor, department of computer science, bahria university islamabad. abstract the main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. a number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. the proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. for each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. the potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. moreover, the model can help to initiate a web preservation process and create a wellorganized web archive to efficiently manage the archived web contents. a section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources. introduction the amount of information generated by institutions is increasing with the passage of time. one of the mediums that uses this information is the world wide web (www). the www has become a tool to share information quickly with everyone regardless of their physical location. the number of web pages is vast. google and bing each index approximately 4.8 billion.1 though the www is a rapidly growing source of information, it is fragile in nature. according to the available statistics, 80 percent of pages become unavailable after one year and 13 percent of links (mostly web references) in scholarly articles are broken after 27 months.2 moreover, 11 percent of posts and comments on websites for various purposes are lost within a year. according to another study conducted on 10 million web pages collected from the internet archive in 2001, the average survival rate of web pages is 1,132.1 days with a standard deviation of 903.5 days. 90.6 percent pages of those web pages are inaccessible today.3 the information fragility causes this valuable scholarly, cultural, and scientific information to vanish and become inaccessible to future generations. in recent years, it was realized that the lifespan of digital objects is very short, and rapid technological changes make it more difficult to access these objects. therefore, there is a need to preserve the information available on the www. digital preservation is performed using the primary methods of emulation and migration, in which emulation provides the preserved digital objects in their original format while migration provide objects in a different format.4 in the last systematic approach towards web preservation | khan and ur rahman 72 https://doi.org/10.6017/ital.v38i1.10181 two decades, a number of institutions worldwide, such as national and international libraries, universities, and companies started to preserve their web resources (resources found at a web server, i.e., web contents and web structure). the first web archive was initiated in 1996 by brewster kahle, named the internet archive, and it holds more than 30 petabytes data, which includes 279 billion web pages, 11 million books and texts, and 8 million other digital objects such as audio, video, image files, etc. more than seventy web archive initiatives were started in 33 countries since 1996, which shows the importance of web preservation projects and preservation of web contents. this information era encourages librarians, archivists, and researchers to preserve the information available online for upcoming generations. while digital resources may not replace the information available in physical form, the digital version of these information resources improves access to the available information.5 there are different aspects of the preservation process and web archiving, e.g., digital objects’ ingestion to the archive during preservation process, digital object’s format and storage, archival management, administrative issues, access and security to the archive, and preservation planning. these aspects need to be understood for effective web preservation and will help in addressing the challenges that occur during the preservation process. the reference model for open archival information system (oais) is an attempt to provide a high-level framework for the development and comparison of digital archives. in web preservation, a challenging task is to identify the starting point of the preservation process and to effectively complete the process which help to proceed further to the other activities. therefore, the complicated nature of the web and the complex structure of the web contents make the preservation of the web content even more difficult. the oais reference model helps in achieving the goals of a preservation task in a step-by-step manner. the stakeholders are identified, i.e., producer, management, and consumer, and the packages, i.e., submission information package (sip), archival information package (aip) and dissemination information package (dip), which need to be processed, are clearly defined.6 this study aims to design a step-by-step systematic approach for web preservation that helps to understand preservation or archival activities’ challenges, especially those that relate to digital information objects at various steps of the preservation process. the systematic approach may lead to an easy way to analyze, design, implement, and evaluate the archive with clarity and different options for an effective preservation process and archival development. an effective preservation process is one that leads to a well-organized, easily managed web archive and accomplishes designated community requirements. this approach may help to address the challenges and risks that confront archivists and analysts during preservation activities. step-by-step systematic approach digital preservation is “the set of processes and activities that ensure long-term, sustained storage of, access to and interpretation of digital information.”7 the growth and decline rates of www content and the importance of the information presented on the web make it a key candidate for preservation. web preservation confronts a number of challenges due to its complex structure, a variety of available formats, and the type of information (purpose) it provides. the overall layout of the web varies domain to domain based on the type of information and its presentation. the websites can be categorized based on two things. first, the type of information (i.e., the web information technology and libraries | march 2019 73 contents) and second, the way this information presented (i.e., the layout or structure of the web page. examples include educational, personal, news, e-commerce, and social networking websites, which vary a lot in their contents and structure. the variations in the overall layout make it difficult to preserve different web contents in a single web archive. the web preservation activities are summarized in figure 1. the following sections explain the web preservation activities and possible implementation in proposed systematic approach. defining the scope of the web archive the www provides an opportunity to share information using various services, such as blogs, social networking websites, e-commerce, wikis, and e-libraries. these websites provide information on a variety of topics and address different communities based on their interest and needs. there are many differences in the way the information is handled and presented on the www. in addition, the overall layout of the web changes from one domain to another domain.8 therefore, it is not practically feasible to develop a single system to preserve all types of websites for the long term. so, before starting to preserve the web, one (the archivist) should define the scope of the web to be archived. the archive will be either a site-centric, topic-centric, or domaincentric archive.9 site-centric archive a site-centric archive focuses on a particular website for preservation. these types of archives are mostly initiated by the website creator or owner. the site-centric web archives allow access to the old versions of the website. topic-centric archive topic-centric archives are created to preserve information on a particular topic published on the web for future use. for scientific verification, researchers need to refer to the available information while it is difficult to ensure access to these contents due to the ephemeral nature of the web. a number of topic-centric archive projects have been performed including the archipol archive of dutch political websites,10 the digital archive for chinese studies (dachs) archive2,11 minerva by the library of congress,12 and the french elections web archive for archiving the websites related to the french elections.13 domain-centric archive the word “domain” refers to a location, network, or web extension. a domain-centric archive covers websites published with a specific domain name dns, using either a top-level domain (tld), e.g., .com, .edu, or .org, or a second-level domain (sld), e.g., .edu.pk or .edu.fr. an advantage of domain-centric archiving is that it can be created by automatically detecting specific websites. several projects have a domain-centric scope, e.g., the portuguese web archive (pwa) national websites,14 the kulturarw, a swedish royal library web archive collection of.se and .com domain websites,15 and the uk government web archive collection of uk government websites, e.g., .gov.uk domain websites. understanding the web structure after defining the scope of the intended web archive, the archivist will have a better understanding of the interest and expected queries of the intended community based on the resources available or the information provided by the selected domain. the focus in this step is to understand the type of information (contents) provided by the selected domain and how the information has been presented. the web can be understood by two dimensions. the first systematic approach towards web preservation | khan and ur rahman 74 https://doi.org/10.6017/ital.v38i1.10181 figure 1. systematic approach for web preservation process. information technology and libraries | march 2019 75 considers the web as a medium that communicates contents using various protocols, i.e., http, and the second considers the web as a content container, which further presents the contents to the viewers and not simply contents, e.g. the underlying technology used to display the contents.16 the preservation team should understand such parameters as the technical issues, the future technologies, and the expected inclusion of other related content. identify the web resources the archivist should understand the contents and the representation of the contents of the selected domain, e.g., blogs, social networking websites, institutional websites, educational institutional websites, newspaper websites, or entertainment websites. all of these websites provide different information and address individual communities that have distinct information needs. a web page is the combination of two things, i.e., web contents and web structure.17 the resources which can be preserved are as follows. web contents web contents or web information can be categorized into the following categories: • textual contents (plain text): this category describes textual information that appears on a web page. it does not include links, behaviors, and presentation stylesheets. • visual contents (images): these contents are the visual forms of information or are a complementary material to the information provided in the textual form. • multimedia contents: as another form of information, multimedia contents mainly include audio and video. it may also include animation or even text as a part of a video or a combination of text, audio, and video. web structure web structure can be categorized in the following categories: • appearance (web layout or presentation): this category indicates the overall layout or presentation of a web page. the look and feel of a web page (representation of the contents) are important, which is maintained with different technologies, e.g., html or stylesheets, etc. • behavior (code navigations): categorized by link navigations, these can be within a website or to other websites, external document links or dynamic and animated features, such as live feed, comments, tagging, or bookmarking. identify designated community the archivist should identify the designated community of the intended web archive, their functional requirements and expected queries by analyzing them carefully. the designated community means the potential users, such as those who can access the archived web contents for different purposes, i.e., accessing old information that is not available in normal circumstances or referring to an old news article which is not bookmarked properly or retrieving relevant news articles published long ago, etc. prioritize the web resources after a comprehensive assessment of the resources of the selected domain and the identification of potential users’ requirements and expected queries, the archivist should prioritize the web systematic approach towards web preservation | khan and ur rahman 76 https://doi.org/10.6017/ital.v38i1.10181 resources. the complexity of web resources and their representation cause complications in the digital preservation process. generally, it may be undesirable or unviable to preserve all web resources; therefore, it is worthwhile to designate the web resources for preservation. the priority should be assigned on the basis of two things: first, the potential reuse of the resource and second, the frequency with which the resource will be accessed. the resources with no value, little value, or those managed elsewhere can be excluded. for prioritization of resources, the moscow method can be applied.18 the acronym moscow can be elaborated as: m must have, the resource must be preserved or resources that must be a part of the archive and preserved. for example, in the digital news story archive (dnsa), the textual news story must be preserved in the archive because the preservation emphasis is on a textual news story.19 online news contains textual news stories, and many news stories contain associated images, and a fraction of news stories contain associated audio-video contents. s should have, the resource should be preserved if at all possible. almost all the news stories have associated images; a few news stories have associated audio and video that complement it and should be preserved as a part of the news story in the web archive. c could have, the resource could be preserved if it does not affect anything else or is nice to have. the web structure in dnsa depends on the resources to be used for the preservation of news stories; the layout of the newspaper website could (c) be a part of the preservation process if it does not affect anything, e.g., storage capacity and system efficiency. w won’t have, the resource would not be included. archiving multiple versions of the layout or structure of the online newspaper are not worthwhile and hence would not (w) be preserved. the prioritization of these resources is very important in the context of web preservation planning because it does not waste time and energy, and it is the best way to handle users’ requirements and fulfill their expected queries. how to capture the resource(s) the selection of a feasible capturing technique depends on: first, the resources to be captured and second, the capturing task frequency. there are three web resources capturing techniques, i.e., by browser, web crawler, and authoring system. each capturing technique has associated advantages and disadvantages.7 web capturing using browsers the intended web content can be captured using browsers after a web page is rendered when the http transaction occurs. this technique is also referred to as a snapshot or post-rendering technique. the method captures those things which are visible to the users; the behavior and other attributes remain invisible. capturing static contents is one of the disadvantages of web capturing by the browser approach, this approach generally preserved contents in the form of images. it is best for well-organized websites, and commercial tools are available for capturing the web. the following are well-known tools to capture web using browsers. webcapture (https://web-capture.net/) is a free online web-capturing service. it is a fast web page snapshot tool, which can grab web pages in seven different formats, i.e. jpeg, tiff, png, bmp information technology and libraries | march 2019 77 image formats, pdf, svg, and postscript files of high quality. it also allows downloading the intended format in a zip file and is suitable for long vertical web pages with no distortion in layout. a.nnotate (http://a.nnotate.com/), is an online annotating web snapshot tool to keep track of information gathered from the web efficiently and easily. it allows adding tags and notes to the snapshot and building a personal index of web pages as document index. the annotation feature can be used for multiple purposes, for example, compiling an annotated library of objects for organization, sharing commented web pages, product comparison, etc. snagit (https://www.techsmith.com/screen-capture.html) is a well-known snapshot tool for capturing screens with built-in advanced image editing features and screen recording. snagit is a commercial and advanced screen capture tool that can capture web pages with images, linked files, source code, and the url of the web page. acrobat webcapture (file > create > pdf from web page...) creates a tagged pdf file from the web page that a user visits while the adobe pdf toolbar is used for the entire website.20 the capture by a browser technique has the following advantages: • by this technique, the archivist can capture only the displayed contents, and it is an advantage if you need to preserve the displayed contents only. • it is a relatively simple technique for well-organized websites. • commercial tools exist for web capturing using browsers. in addition, the disadvantages are the following: • capturing displayed contents only is a disadvantage if the focus is not on only displayed contents. • it results in frozen contents and treats contents as if they are publications. • it loses the web structure, such as appearance, behavior, and other attributes of the web page. web capturing using an authoring system/server the authoring system capturing technique is used for web harvesting directly from the website hosting server. all the contents, e.g., textual information, images, and source code, are collected from the source web server. the authoring system allows the archivist to preserve the different versions of the website. the authoring system depends on the infrastructure of the content management system and is not a good choice for external resources. the system is best for an owned web server and works well for limited internal purposes. the web curator tool (http://webcurator.sourceforge.net/), pandas (an old british library harvesting tool), and netarchivesuite (https://sbforge.org/display/nas/netarchivesuite) are known tools use for planning and scheduling web harvesting. they can be used by non-technical personnel for both selection and harvesting web content selection policies. these web archiving tools were developed in a collaboration of the national library of new zealand and the british library and are used for the uk web archive (http://www.ariadne.ac.uk/issue50/beresford/). the tools can interface with web crawlers, such as heritrix (https://sourceforge.net/projects/archivecrawler/). authoring systems are also referred to as workflow systems or curatorial tools. systematic approach towards web preservation | khan and ur rahman 78 https://doi.org/10.6017/ital.v38i1.10181 the authoring system has the following advantages: • it is best for web harvesting, which captures everything available. • it is easy to perform, if you have proper access permission or you own the server or system to access for capturing the resources. • it works in short to medium term resources and feasible for internal access within organizations. the disadvantages of web capturing using the authoring system are: • it captures all available raw information, not only presentations. • it may be too reliant on the authoring infrastructure or the content management system. • it is not feasible for large term resources, or for external access from outside organization. web capturing using web crawlers web crawlers are perhaps the mostly used technique for capturing web contents in systematic and automated manner.21 crawler development needs the expertise and experience of different tools, i.e. positive and negative of technologies, and the viability of a tool in a specific scenario. the main advantage of crawlers is that they extract embedded content. heritrix, httrack, wget, and deeparc are common examples of web crawlers. heritrix (https://github.com/internetarchive/heritrix3/wiki) is developed in java, an open source and freely available web crawler, and it was developed by internet archive. heritrix is one of the widely used extensible and web-scale web crawlers in web preservation projects. initially, the heritrix was developed for specific purpose crawling of specific websites and now a resourceful or customize web crawler for archiving the web. httrack (https://www.httrack.com/) is a freely available configurable browser utility. httrack crawls html, images, and other files from a server to a local directory and allows offline viewing of the website. the httrack crawler downloads a complete website from the web server to a local computer system and makes it available for offline for viewing with all related link-structure and seems like the user is using it online. it also updates the archived websites at the local system from the server and resumes all the interrupted previous extractions. the httrack available for both windows and linux/unix operating systems. wget (http://www.gnu.org/software/wget/) is a freely available non-interactive command line tool that can easily be configured with other technologies and different scripts. it can capture files from the web using widely used ftp, ftps, http and https protocols, and support cookies as well. it also updates the archived websites and resumes all the interrupted extractions. wget is available for both microsoft windows and unix operating systems. the advantages of web crawling: • widely used in capturing techniques. • can capture specific content or everything. • avoids some of the accessing issues, such as: link rewriting and embedded external content from an archive or live. information technology and libraries | march 2019 79 disadvantages associated with web crawling: • much work is required, as well as tools or development expertise and experience, etc. • the web crawler does not have the right scope: sometimes, it does not capture everything that it should, and sometimes the crawler captures too much content. web content selection policy in the previous steps, the web resources are identified, prioritized based on requirements and expected queries of the designated community, and feasible capturing technique is identified based on capturing frequency. now, the contents need to be prepared and filtered for selection, and a feasible selection approach needs to be selected based on the contents. a web content selection policy helps to determine and clarify, which web contents are required to be captured based on the priorities, the purpose and the scope of web contents already defined.22 the decision of the selection policy comprises the description of the context, the intended users, the access mechanisms and the expected uses of the archive. the selection policy may comprise the selection process and selection approach. the selection process can be divided into subtasks which, in combination, provide a qualitative selection of web contents to a certain extent, i.e., preparation, discovery, and filtering, as shown in figure 2. the main objective of the preparation phase is to determine the targeted information space, the capture technique, capturing tools, extension categorization, granularity level, and the frequency of archiving activity. the best personnel who can provide help in preparation are the domain experts, regardless of the scope of the web archive. the domain experts may be the archivists, researchers, librarians, or any other authentic reference, i.e. a document or a research article. the tools defined in the preparation phase will help to discover intended information in the discovery phase, which can be divided into the following four categories: 1. hubs may be the global directories or topical directories, collection of sites or even a single web page with essential links related to a particular subject or topic. 2. search engines can facilitate discovery by defining a precise query or set of alternative queries related to a topic. the use of specialized search engines can significantly improve the results of discovering related information that can be greatly improved. 3. crawlers can be used to extract web contents such as textual information, images, audio, video and links. moreover, the overall layout of a web page or a whole website can also be extracted in a well-defined systematic manner. 4. external sources may be non-web sources that may be anything, such as printed material for mailing lists, which can be monitored by the selection team. the main objective of the discovery phase is to determine the source of information to be stored the archive. this determination can be achieved by two ways. first, a manually created entry point list is used to determine the list of entry points (usually links) for crawling the collection manually and updating the list during the crawl. there are two discovery methods, i.e., exogenous and endogenous. exogenous discovery is used in manual selection and mostly relies on exploitation of an entry point list for hubs, search engines, and on non-web documents. second, there is an automatically created entry point list to determine the list of entry points by extracting links automatically and obtaining an updated list every time during the crawl. endogenous discovery is systematic approach towards web preservation | khan and ur rahman 80 https://doi.org/10.6017/ital.v38i1.10181 used in automatic selection and relies on the link extraction using crawlers by exploring the entry point list. figure 2. selection process. the main objective of the filtering phase is to optimize and make concise the discovered web contents (discovery space). filtering is important in order to collect more specific web content and remove unwanted or duplicated content. usually, for preservation, an automatic filtering method is used; manual filtering is useful if the robots or automatic tools cannot interpret the web. the discovery and filter phase can be combined practically or logically. several evaluation axes can be used for the selection policy (e.g., quality, subject, genre, and publisher). in the literature, we have three known techniques for selecting web content. the selection approach can be either automatic or manual. manual content selection is very rare because it is labor intensive: it requires automatic tools for finding the content, and then manual review of that collection to identify the subset that should be captured. automatic selection policies are used frequently in web preservation projects for web collection, especially for web archives.23 the selection of the collection approach depends on the frequency with which the web content has been preserved in the archive. there are four different selection approaches for web content collection. unselective approach the unselective approach implies collecting everything possible; by specifically using this approach, the whole website and its related domains and subdomains are downloaded to the archive. it is also referred to as automatic harvesting or selection, bulk selection, and domain selection.24 the automatic approach is used in a situation where a web crawler usually performs the collection. for example, the collection of websites from a domain, i.e., .edu means all educational institution websites (at domain level) or the collection of all possible contents/pages from a website (harvesting at website level) by extracting the embedded links. a section of the data preservation community believes that technically it is a relatively cheaper, quicker collection approach and yields a comprehensive picture of the web as a whole. in contrast, its significant drawbacks are that it generates huge unsorted, duplicated, and potentially useless data, consuming too many resources. information technology and libraries | march 2019 81 the swedish royal library’s project kulturarw3 harvests websites at domain level, i.e., collecting websites from a .se domain which is a physically located website in sweden and one of the first projects to adopt this approach.25 usually, national-based web archive initiatives adopt the unselective approach, most notably nedlib, a helsinki university library harvester, and aola, an austrian online archive.26 selective approach the selective approach was adopted by the national library of australia (nla) in the pandas project in 1997. in this approach, a website is included for archiving based on certain predefined strategies and on the access and information provided by the archive. the library of congress’ project minerva and the british library project “britain on the web” are the other known projects that have adopted the selective approach. according to nla, the selected websites are archived based on nla guidelines after negotiation with the owners.27 the inclusion decision could be taken at one of the following levels: • website level: which websites should be included from a selected domain, e.g., to archive all educational websites from high level domain “.pk”. • web page level: which web pages should be included from a selected website, e.g., to archive the homepages of all educational websites. • web content level: which type of web contents should be preserved, e.g., to archive all the images from the homepages of educational websites. a selective approach is best if the numbers of websites to be archived are very large or the archiving process is targeting the entire www and wants to narrow down the scope by identifying the resources in which the archivists are more interested. this approach performs implicit or explicit assumptions about the web contents that are not to be selected for preservation. it may be very helpful to initiate a pilot preservation project, which identifies: what is possible? what can be managed? in addition, some tangible results may be obtained easily and quickly in order to enhance the scope of the project in a broader perspective. the selective approach may be based on a predefined criterion or based on an event. selective approach based on criteria involves selecting web resources based on various predefined sets of criteria. nla’s guidance characterizes the criteria-based selective approach as the “most narrowly defined method,” and described it as “thematic selection.” a simple or a complex content-selection criteria can be defined, which depends on the overall goal of preservation. for example, all resources owned by an organization, all resources of one genre, i.e., all programming blogs, resources contributed to a common subject, resources addressing a specific community within an institution, i.e., students or staff, all publications belonging to an individual organization or group of organizations, all resources that may benefit external users or an external user’s community, e.g., historians, or alumni. selective approach based on event involves selecting web resources or websites based on various time-based events. the archivists may focus on websites that address national or international important events, e.g., disasters, elections, and the football world cup, etc. eventbased websites have two characteristics: (1) very frequent updates and (2) website content is lost after a short time, e.g., a few weeks or a few months. for example, the start and end of a term or systematic approach towards web preservation | khan and ur rahman 82 https://doi.org/10.6017/ital.v38i1.10181 academic year, the duration of an activity, e.g., research project, appointment, or departure of a new senior official. deposit approach in the deposit collection approach, the information package is submitted by the administrator or owner of the website which includes a copy of the website with related files that can be accessed through different hyperlinks. the archival information package is applicable to the small collection (of a few websites), or the owner of the website can initiate the preservation project, e.g. a company can initiate a project for preserving their website. the deposit collection approach was adopted by the national archives and records administration (nara) for the collection of us federal agency websites in 2001 and by die deutsche bibliothek (ddb, http://deposit.ddb.de/) for the collection of dissertations and some online publications. new digital initiatives are heavily dependent on administrator or owner support and provide an easy way to deposit new content to the repository, e.g., in the macewan university’s institutional repository, the librarians leading the project tried to offer an easy and effective way to deposit their archival contents.28 combined approach there are advantages and disadvantages associated with each collection approach. the ongoing debate is which approach is best in a given situation. for example, the deposit approach should be an inexpensive agreement with the depositors. the emphasis is to use the combination of automatic harvesting and selective approaches as these two approaches are cheaper as compared to other selection approaches because a few staff personnel are required and cope with technological challenges. this initiative was taken by the bibliothque nationale de france (bnf) in 2006. the bnf automatically crawls information regarding the updated web pages and stores it in an xml-based “site delta” and uses page relevancy and importance, similar to how google ranks pages, to evaluate individual pages.29 the bnf used a selective approach for the deep web (that is, web pages or websites that are behind a password or are otherwise not generally accessible to search engines), referred to as “deposit track.” metadata identification cataloging is required to discover a specific item from the digital collection. an identifier or set of identifiers is required to retrieve a digital record in digital repositories or an archive. for digital documents, this catalog or registration or identifier is referred to as metadata.30 metadata are structured information concerning resources that describe, locate (discover or place), manage, easily retrieve (access) and use digital information resources. metadata are often referred to as “data about data” or “information about information”, but it may be more helpful and informative to describe these data as “descriptive and technical documentation.”31 metadata can be divided into the following three categories: 1. descriptive metadata describes a resource for discovery and identification purposes. it may consist of elements for a document such as title, author(s), abstract, and keywords, etc. 2. structural metadata describes how compound objects are put together, for example, how sections are ordered to form chapters. information technology and libraries | march 2019 83 3. administrative metadata imparts information to facilitate resource management, such as when and how a file was created, who can access the file, its type, and other technical information. administrative metadata is classified into two types: (1) rights management metadata addresses intellectual property rights and (2) preservation metadata contains information needed to archive and preserve a resource.32 due to new information technologies, digital repositories, especially web-based repositories, have grown rapidly over the last two decades. this interest prompts the digital libraries communities to devise metadata strategies to manage the immense amount of data stored in digital libraries.33 metadata play a vital role in the long-term preservation of digital objects and important to identify the metadata which may help to retrieve a specific object from the archive after preservation. according to duff et al., “the right metadata is the key to preserving digital objects.”34 there are hundreds of metadata standards developed over the years for different user environments, disciplines, and for different purposes; many of them are in their second, third, or nth edition.35 digital preservation and archiving requires metadata standards to trace and ensure its access to the digital objects. several of the common standards are briefly discussed below. dublin core metadata initiative (dcmi, http://dublincore.org/) was initiated at the 2nd world wide web conference in 1994 and was standardized by ansi/niso z39.85 in 2001 and iso 15386 in 2003.36 the main purpose of the dcmi was to define an element set for representing web resources; initially, thirteen core elements were defined which later increased to a fifteen-element set. the elements are optional, repeatable, can be followed in any order, and expressed in xml.37 metadata encoding and transmission standard (mets, http://www.loc.gov/standards/mets/) is an xml metadata standard intended to represent information of the complex digital objects. mets elements evolved from the early project making of america ii “moa2” in 2001, supported by the library of congress and sponsored by the digital library federation “dlf” and registered with national information standards organization “niso” in 2004. a mets document contains seven major sections in which each contains different aspects of metadata.38 metadata object description schema (mods, http://www.loc.gov/standards/mods/) was initiated by the marc21 maintenance agency at the library of congress in 2002. mods elements are richer then dcmi, simpler then marc21 bibliographic format and expressed in xml.39 the mods identified the widest facets or features of an object and presented nineteen high-level optional elements.40 visual resources association core strategies (vra core, http://www.loc.gov/standards/vracore/) was developed in 1996, and the current version 4.0 was released in 2007. the vra core is a widely used standard for art, libraries, and archives for such objects as paintings, drawings, sculpture, architecture, and photographs, as well as books and decorative and performance art.41 the vra core contains nineteen elements and nine sub-elements.42 preservation metadata implementation strategies (premis, http://www.loc.gov/standards/premis/) was developed in 2005, sponsored by the online computer library center (oclc) and the research libraries group (rlg), includes a data dictionary and some information about metadata. premis defined a set of five interactive core semantic units or entities and xml schema for endorsing digital preservation activities. it is not systematic approach towards web preservation | khan and ur rahman 84 https://doi.org/10.6017/ital.v38i1.10181 concerned with discovery and access but with common metadata, and for descriptive metadata, other standards (dublin core, mets or mods) need to be used. the premis data model contains intellectual entities (contents that can be described as a unit, e.g., books, articles, databases), objects (discrete units of information in digital form, which can be files, bitstreams, or any representation), agents (people, organization, or software), events (actions that involve an object and an agent known to the system) and rights (assertion of rights and permission).43 it is indisputable that good metadata improves access to the digital object in the digital repository. therefore, the creation and selection of appropriate metadata make the web archive accessible to the archive user. structure metadata helps to manage the archival collection internally, as well as the related services, but may not always help to discover the primary source of the digital object.44 currently, there are many semi-automated metadata generation tools. the use of these semiautomatic tools for generating metadata is crucial for the future, considering the operation’s complexity and cost of manual metadata origination.45 archival format the web archive initiatives select websites for archiving based on relevance of contents and the intended audience of the archived information. the size of the web archives varies significantly depending on their scope and the type of content they are preserving, e.g., web pages, pdf documents, images, audio, or video files.46 to preserve these contents, a web archive uses different storage formats containing metadata and utilizes data compression techniques. the internet archive defined the arc format (http://archive.org/web/researcher/arcfileformat.php), later used as a defacto standard. in 2009, the internet organization for standardization (iso) established the warc format (https://goo.gl/0rbwsn) as an official standard for web archiving. approximately 54 percent of web archive initiatives applied arc and warc formats for archiving. the use of standard formats helps the archivists to facilitate the creation of collaborative tools, such as search engines and ui utilities to efficiently manipulate the archived data.47 information dissemination mechanisms a well-defined preservation process can lead to a well-organized web archive that is easy to maintain and easy to retrieve a specific digital object from the collection using information dissemination techniques. poor search results are one of the main problems in information dissemination of web archives. the users of a web archive expend excessive time to retrieve intended documents or information to satisfy the user’s query. archivists are more concerned with “ofness,” “what collections are made up of,” although archive users are concerned with aboutness, “what collections are about.”48 to use the full potential of web archives a usable interface is needed to help the user to search the archive for specific digital object. full text and keyword search are the dominant ways to search the unstructured information repository, evidently observed from the online search engines. the sophistication of search results against user queries is based on the ranking tools.49 the access tools and techniques are getting the attention of researchers, and approximately 82 percent of european web archives concentrate on such tools, which makes these web archives easily accessible.50 the lucene full-text search engine and its extension nutchwax is widely used in web archiving. moreover, for the combination of semantic descriptions that already rely on or are implicit within their descriptive metadata, reasoning-based or semantic searching of the archival information technology and libraries | march 2019 85 collection can enable the system to produce novel possibilities for the archival content retrieval and browsing.51 even in the current era of digital archives, mobile services are adopted in digital libraries, e.g., access to e-books, libraries databases, catalogs, and text messaging are common mobile services offered in university libraries.52 in a massive repository, a user query retrieves millions of documents, which makes it difficult for users to identify the most relevant information. the ranking model estimates the results relevancy based on user’s queries using specified criteria to overcome this problem and sorts the results by placing the most relevant result at the top.53 there are a number of ranking models that exist in the literature, e.g., conventional ranking models, e.g., tf-idf, bm25f, temporal ranking models, e.g., pagerank, and learning to rank models, e.g., l2r. the findings of the systematic approach for web preservation are used to automate the process of the digital news-story preservation. the steps of the proposed model are carefully adopted to develop a tool that is able to add contextual information to the stories to be preserved. digital news stories preservation framework the advancement of web technologies and maturation of the internet attracts news readers to access news online that is provided by multiple sources and to obtain the desired information comprehensively. the amount of news published online has grown rapidly, and for an individual, it is cumbersome to browse through all online sources for relevant news articles. the news generation in the digital environment is no longer a periodic process with a fixed single output, such as printed newspapers. the news is instantly generated and updated online in a continuous fashion. however, because of different reasons, such as the short lifespan of digital information and the speed of generation of information, it has become vital to preserve digital news for the long term. digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important.54 libraries and archives preserve by carefully digitizing newspapers considering as a good source of knowing the history. many approaches have been developed to preserve digital information for the long term. the lifespan of news stories published online varies from one newspaper to another, i.e., from one day to a month. however, a newspaper may be backed up and archived by the news publisher or national archives; in the future, it will be difficult to access particular information published in various newspapers regarding the same news story. the issues become even more complicated if a story is to be tracked through an archive of many newspapers, which requires different access technologies. the digital news story preservation (dnsp) framework was introduced to preserve digital news articles published online from multiple sources.55 the dnsp framework is planned based on adopting the proposed step-by-step systematic approach for web preservation to develop a wellorganized web archive. initially, the main objectives defined for the dnsp framework are: • to initiate a well-organized national level digital news archive of multiple news sources. • to normalize news articles during preservation to a common format for future use. • to extract explicit and implicit metadata, which would be helpful in ingesting stories to the archive and browsing through the archive in the future. • to introduce content-based similarity measures to link digital news articles during preservation. systematic approach towards web preservation | khan and ur rahman 86 https://doi.org/10.6017/ital.v38i1.10181 the digital news story extractor (dnse) is a tool developed to facilitate the extraction of news stories from the online newspapers and to migrate to a normalized format for preservation. the normalized format also includes a step to add metadata in the digital news stories archive (dnsa) for future use.56 to facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. an effective term-based approach “common ratio measure for stories (crms)” for linking digital news articles in dnsa is introduced that links similar news articles during the preservation process.57 the approach is empirically analyzed, and the results of the proposed approach are compared to get conclusive arguments. the initial results computed automatically using a common ratio measure for stories are encouraging and are compared with the similarity of news articles based on human judgment. the results are generalized by defining a threshold value based on multiple experimental results using the proposed approach. currently, there is ongoing work to extend the scope of dnsa to dual languages, i.e., urdu and english, as well as content-based similarity measures to link news articles published in urduenglish. moreover, research is underway to develop tools for exploiting the linkage created among stories during the preservation process for search and retrieval tasks. summary effective strategic planning is critical in creating web archives; hence, it requires a wellunderstood and a well-planned preservation process. the process should result in a wellorganized web archive that includes not only the content to be preserved but also the contextual information required to interpret the content. the study attempts to answer many questions by guiding the archivists and related personnel, such as: how to lead the web preservation process effectively? how to initiate the preservation process? how to proceed through different steps? what are the possible techniques that may help to create a well-organized web archive? how can the archived information can be used to its greatest potential? to answer these questions, the study resulted in an appropriate step-by-step process for web preservation and a well-organized web archive. the targeted goal of each step is identified by researching the existing approaches that can be adopted. the possible techniques for those approaches are discussed in detail for each step. references 1 “world wide web size,” the size of the world wide web, visited on jan 31, 2019, http://www.worldwidewebsize.com/. 2 brian f. lavoie, “the open archival information system reference model: introductory guide,” microform & imaging review 33, no. 2 (2004): 68-81; alexandros ntoulas, junghoo cho, and christopher olston, “what's new on the web? the evolution of the web from a search engine perspective,” in proceedings of the 13th international conference on world wide web-04 (new york, ny: acm, 2004), 1-12. information technology and libraries | march 2019 87 3 teru agata et al., “life span of web pages: a survey of 10 million pages collected in 2001,” ieee/acm joint conference on digital libraries, (ieee, 2014), 463-64, https://doi.org/10.1109/jcdl.2014.6970226. 4 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146. 5 claire warwick et al., “library and information resources and users of digital resources in the humanities,” program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555. 6 lavoie, “open archival information system reference model.” 7 susan farrell, k. ashley, and r. davis, “a guide to web preservation,” practical advice for web and records managers based on best practices from the jisc-funded powr project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/guide-2010-final.pdf. 8 lavoie, “open archival information system reference model;” farrell, ashley, and davis, “guide to web preservation.” 9 peter lyman, “archiving the world wide web,” washington, library of congress (2002), https://www.clir.org/pubs/reports/pub106/web/. 10 diomidis spinellis, “the decay and failures of web references,” communications of the acm 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422. 11 digital archive for chinese studies (dachs) archive2 https://www.zo.uniheidelberg.de/boa/digital_resources/dachs/index_en.html, visited on jan 31, 2019. 12 julien masanès, “web archiving methods and approaches: a comparative study,” library trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005. 13 hanno lecher, “small scale academic web archiving: dachs,” in web archiving (berlin/heidelberg: springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-463320_10. 14 daniel gomes et al., “introducing the portuguese web archive initiative,” in 8th international web archiving workshop (berlin/heidelberg: springer, 2009). 15 gerrit voerman et al., “archiving the web: political party web sites in the netherlands,” european political science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51. 16 sonja gabriel, “public sector records management: a practical guide,” records management journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914. 17 farrell, ashley, and davis, “guide to web preservation.” systematic approach towards web preservation | khan and ur rahman 88 https://doi.org/10.6017/ital.v38i1.10181 18 jung-ran park and andrew brenza, “evaluation of semi-automatic metadata generation tools: a survey of the current state of the art,” information technology and libraries 34, no. 3 (sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889. 19 muzammil khan and arif ur rahman, “digital news story preservation framework,” in digital libraries: providing quality information: 17th international conference on asia-pacific digital libraries, icadl 2015 seoul, korea, december 9-12, 2015 (proceedings, vol. 9469, springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9; muzammil khan, “using text processing techniques for linking news stories for digital preservation,” phd thesis, faculty of computer science, preston university kohat, islamabad campus, hec pakistan, 2018. 20 dennis dimick, “adobe acrobat captures the web,” washington apple pi journal (1999): 23-25. 21 trupti udapure, ravindra d. kale, and rajesh c. dharmik, “study of web crawler and its different types,” iosr journal of computer engineering (iosr-jce) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105. 22 dora biblarz et al., “guidelines for a collection development policy using the conspectus model,” international federation of library associations and institutions, section on acquisition and collection development (2001). 23 farrell, ashley, and davis, “guide to web preservation;” e. pinsent et al., “powr: the preservation of web resources handbook,” http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010); michael day, “preserving the fabric of our lives: a survey of web preservation initiatives,” lecture notes in computer science (berlin/heidelberg: springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42. 24 pinsent et al., “powr:”; day, “preserving the fabric.” 25 allan arvidson, “the royal swedish web archive: a complete collection of web pages,” international preservation news (2001): 10-12. 26 andreas rauber, andreas aschenbrenner, and oliver witvoet, “austrian online archive processing: analyzing archives of the world wide web,” research and advanced technology for digital libraries (2002): ecdl 2002. lecture notes in computer science, vol 2458, (berlin/heidelberg: springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-x_2. 27 william arms, “collecting and preserving the web: the minerva prototype,” rlg diginews 5, no. 2 (2001). 28 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: micro interactions and the user experience,” information technology and libraries 34, no. 3 (sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900. 29 serge abiteboul et al., “a first experience in archiving the french web,” in international conference on theory and practice of digital libraries, (berlin/heidelberg: springer, 2002), 115, https://doi.org/10.1007/3-540-45747-x_1; sergey brin and lawrence page, “reprint of: information technology and libraries | march 2019 89 the anatomy of a large-scale hypertextual web search engine,” computer networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007. 30 masanès, “web archiving.” 31 niso-press, “understanding metadata,” national information standards (2004), http://www.niso.org/publications/understanding-metadata. 32 ibid. 33 jane greenberg, “understanding metadata and metadata schemes,” cataloging & classification quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/j104v40n03_02. 34 michael day, “preservation metadata initiatives: practicality, sustainability, and interoperability,” publishers: archivschule marburg (2004): 91-117. 35 jenn riley, glossary of metadata standards (2010). 36 corey harper, “dublin core metadata initiative: beyond the element set,” information standards quarterly 22, no. 1 (2010): 20-31. 37 jane greenberg, “dublin core: history, key concepts, and evolving context (part one),” in slide presentation on dc-2010 international conference on dublin core and metadata applications pittsburgh, pa (2010). 38 cundiff v. morgan, “an introduction to the metadata encoding and transmission standard (mets),” library hi tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; leta negandhi, “metadata encoding and transmission standard (mets),”in texas conference on digital libraries, tcdl-2012 (2012). 39 sally h. mccallum, “an introduction to the metadata object description schema (mods),” library hi tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521. 40 r. gartner, “mode: metadata object description schema,” jisc techwatch report tsw (2003): 03-06. www.loc.gov/standards/mods/. 41 vra-core, “an introduction of vra core,” http://www.loc.gov/standards/vracore/vra core4 intro.pdf, created: oct 2014. 42 vra-core, “vra core element outline,” http://www.loc.gov/standards/vracore/vra core4 outline.pdf, created: feb 2007. 43 priscilla caplan, “understanding premis,” washington dc, usa: library of congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; j. relay, “an introduction to premis,” singapore ipress tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial ipres2011 singapore.pdf. systematic approach towards web preservation | khan and ur rahman 90 https://doi.org/10.6017/ital.v38i1.10181 44 jennifer schaffner, “the metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies,” making archival and special collections more accessible, 85 (2015). 45 joao miranda and daniel gomes, “trends in web characteristics,” in web congress, 2009. laweb'09. latin american, (ieee, 2009), 146-53, https://doi.org/10.1109/la-web.2009.28. 46 daniel gomes, joão miranda, and miguel costa, “a survey on web archiving initiatives,” research and advanced technology for digital libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41. 47 ibid. 48 schaffner, “metadata is the interface.” 49 miguel costa and mário j. silva, “evaluating web archive search systems,” in international conference on web information systems engineering (berlin/heidelberg: springer, 2012), 440454. https://doi.org/10.1007/978-3-642-35063-4_32. 50 foundation, i, “web archiving in europe,” technical report, commercenet labs (2010). 51 georgia solomou and dimitrios koutsomitropoulos, “towards an evaluation of semantic searching in digital repositories: a dspace case-study,” program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/prog-07-2013-0037. 52 liu yan quan and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology and libraries 34, no. 2 (june 2015): 133, https://doi.org/10.6017/ital.v34i2.5650. 53 ricardo baeza-yates and berthier ribeiro-neto, modern information retrieval 463. (new york: acm pr., 1999). 54 daniel burda and frank teuteberg, “sustaining accessibility of information through digital preservation: a literature review,” journal of information science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107. 55 muzammil khan et al., “normalizing digital news-stories for preservation,” in digital information management (icdim), 2016 eleventh international conference on (ieee, 2016), 8590, https://doi.org/10.1109/icdim.2016.7829785. 56 khan, et al., “normalizing digital news.” 57 muzammil khan, arif ur rahman, and m. daud awan, “term-based approach for linking digital news stories,” in italian research conference on digital libraries (cham, switzerland: springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13. letter to the editors information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16983 about this section letters to the editor reflect the opinions of their authors and are not necessarily those of the ital editorial board or ala’s core division. each letter’s copyright is held by its authors and is published under a creative commons cc-by-nc-4.0 license. to the editors at information technology and libraries: the richard brzustowicz article entitled “from chatgpt to catgpt” in your sept. 2023 issue sparked much discussion in several online cataloging communities, much of it consisting of amazement that such a poorly designed experiment could have made it through the peer review process at ital. the structure of the article demonstrates a clear misunderstanding of what generative artificial intelligence [ai] even is—from asking the program itself questions regarding its training data to saying that the program follows cataloging rules. starting from a flawed premise leads to flawed results. so what is generative ai? generative ai programs such as chatgpt are language-learning models, a subclass of neural network. neural networks are machine-learning algorithms whose structures are modeled on the structure of the human brain. they solve problems through trial and error and, with the increasing affordability of cloud computing and processing power, can process vast amounts of training data and draw conclusions from it. this is an extremely useful tool. as janelle shane says, “they’re great at matching patterns and finding subtle trends in highly multivariate data. crucially, they make progress towards their goal even if the programmer doesn’t know how to solve the problem ahead of time.”1 how does this apply to chatgpt? generative text ai are essentially extremely advanced predictive text generators. “chatgpt is always fundamentally trying … to produce a ‘reasonable continuation’ of whatever text it’s got so far, where by ‘reasonable’ we mean ‘what one might expect someone to write after seeing what people have written on billions of webpages, etc.’”2 because it is trained on a lot of natural language materials, it can produce very convincing sentences that seem to carry meaning. openai’s own faq explains it in this way: chatgpt is called “a language model trained to produce text.” in other words, it “uses human demonstrations and preference comparisons to guide the model toward desired behavior.” “these models were trained on vast amounts of data from the internet written by humans, including conversations, so the responses it provides may sound human-like.” they warn, however, that, “chatgpt is not connected to the internet, and it can occasionally produce incorrect answers. it has limited knowledge of world and events after 2021 and may also occasionally produce harmful instructions or biased content,” and “chatgpt will occasionally make up facts or ‘“hallucinate’” outputs. if you find an answer is unrelated, please provide that feedback by using the ‘“thumbs down’” button.”3 chatgpt does not think for itself. it is not self-aware and cannot meaningfully answer questions about itself. it also cannot be trained like a human because it doesn’t “understand” anything. it repeats back information and data based on statistical probabilities. even the people who created chatgpt, in their own faq, warn users that it can give incorrect or made-up answers that are unrelated to the questions and input that a person gives it. information technology and libraries december 2023 letter to the editors 2 amram, malabud, and hollingsworth furthermore, it is baffling that brzustowicz claimed that chatgpt was able to generate accurate records when even a cursory glance at the author’s own appended data shows multiple mismatches between generated records and existing records. however, we have taken more than a cursory glance, and we believe that it is important to really dig into all the ways in which chatgpt fails at even basic cataloging tasks. let’s take this item by item. the first example the author puts forth consists of a chatgpt record and a professional cataloger’s record for “the 1996 reprint of interview with the vampire by anne rice using rda.”4 this first example contains several critical informational differences between the two records. starting at the top, the 020 fields, representing the isbn of the work, contain different numbers. searching for the isbn listed in the chatgpt record in oclc connexion brings up many records from various years, published by ballantine books, but no record from 1996. a search for the isbn from the professional record also brings up many records, but published by knopf, not ballantine books, and there is a 1996 edition featured. additionally, the isbn for the chatgpt record is labeled as a paperback edition, and the isbn for the professional record is labeled as a hardback edition. the 040 field, which records the source of the cataloging, in the chatgpt record features the code dlc, which is the code for the library of congress. this is incorrect: this is not a library of congress-created record, but one generated by chatgpt. (this false attribution is common to all of the chatgpt records save one.) additionally, the 040 shows that chatgpt did not generate an rda record, as it is missing the subfield e which would indicate the use of rda. continuing down the record, the 250 field in the professional record holds an edition statement, while the chatgpt record has no 250 field at all. the author did not describe which edition they were basing their search off of, so it is difficult to proclaim one as correct for the item in hand and the other incorrect, but either way, an edition statement is a core element of a descriptive record, and the difference between these two records is not encouraging. as can be surmised from the isbn differences, the publisher featured in the 260 $b subfield differs between the chatgpt record, which attributes it to ballantine books, and the professional record, which attributes it to knopf. this alone is deeply worrying; the publisher is such a vital piece of information for identifying which record to apply that chatgpt’s failure to provide an accurate value invalidates the record. again, we do not know which company published the item the author had in hand, but as there is no 1996 ballantine edition represented in oclc at the time of this writing, it is not hard to see the inaccuracy. even more glaringly, the 300 field, which contains the extent, contains both a significantly different page count value (372 pages from chatgpt and 340 pages from the professional record), but also physical size of the book (18 cm versus 22 cm). the 300 field in the chatgpt record also has a period at the end of the field, which is incorrect (the 300 only ends in a mark of punctuation when there is a series statement, which there is not in either record). the next significant difference between the two records comes in the subject headings. the chatgpt record only has two 650 fields (each, interestingly and incorrectly, repeated twice), and they are very basic, only “vampires $v fiction” and “horror tales.” the professional record, on the information technology and libraries december 2023 letter to the editors 3 amram, malabud, and hollingsworth other hand, has three distinct subject headings, one including the established 600 for the main character of the work, and a library of congress genre/form terms (lcgft) term, bringing the total subject descriptive fields up to four (double the amount of chatgpt, and more useful ones). the records for the album low by david bowie contain similar discrepancies. the chatgpt record has no place of publication or publisher, the title contains a deprecated subfield h, and the 336, which represents the content type, is incorrect (“notated music” indicates that the music is written down and “intended to be perceived visually,”5 whereas the professional record correctly has “performed music” in the 336, which is correct for an album). additionally, the professional record contains a track listing, which the chatgpt record lacks, and more accurate subject headings. the chatgpt record for the german translation of paulo freire’s pedagogy of the oppressed in the original article has been compared to a dutch-language record, not an english-language record, and so we will not attempt to analyze the differences. we will say, however, that the 240 field, meant for the uniform title, in the chatgpt record reads “pedagogy of the oppressed. $l german,” when it should use the original title of the work, which was in portuguese, and should instead read “pedagogia do oprimido. $l german.” furthermore, the call number in the 050 field has a second indicator 0, which indicates that it was generated by the library of congress, which it was not, and is also incorrect—paulo freire’s works are generally classed into lb880.f73, and the classification number used by chatgpt, lb875, is used for american educators only. there is also an 042 field claiming that the record is a pccgenerated record, which is false. the record attributed to chatgpt for cixin liu’s the three body problem is character-forcharacter identical to the record attributed to a professional cataloger. one wonders whether there was a copy-paste error, especially given that the text of the article claims that the chatgpt record did have differences from the professional record.6 as we do not have a copy of mood rings’ “pathos y lagrimas” in hand, we cannot assess the accuracy of the chatgpt record. that being said, this record has an oclc number in the 035 field, which, when searched, has already been assigned to an open-access electronic resource record created in 2018. (the only other example record in the original article with an 035 was also a chatgpt record, for the german translation of pedagogy of the oppressed, and the oclc number was also already assigned elsewhere.) we have gone into some detail regarding the cataloging fields and specifications in the above analysis. this was a deliberate choice. the author of the original article gives their job title as instruction and outreach librarian. we cannot speak to their cataloging experience, or lack thereof, but anyone trained in cataloging practices would have caught at least some of the above errors in chatgpt’s output on an initial read. there is more to cataloging than the look of the record, or the existence of certain fields. just because a record has a title field, a publication information field, and a handful of subject headings does not make it an accurate record, or useful to the researcher in any meaningful way. cataloging is a precise, detail-oriented practice, as it must be in order to distill the contents of a work into a single, searchable record. the original article is careful to mention that review of chatgpt records is needed before they can be loaded into a library catalog. rather than starting with a blank slate which can be filled in from information technology and libraries december 2023 letter to the editors 4 amram, malabud, and hollingsworth the start by a trained cataloger with the item in hand, the author would have us start with an error-riddled tangle of probability-predicted words and spend our time instead unpicking the mistakes of a generative language model. chatgpt cannot understand the rules of marc or rda, as a human can. all it can do is generate predicted text strings. yes, cataloging takes time, care, and attention to do correctly. this does not mean that automation would make the process easier or faster; instead, as we have seen here, attempting to automate the process only slows things down. rather than proving that chatgpt is a useful, accurate tool for generating records, the article has instead proven that chatgpt cannot be a successful replacement for a trained, professional cataloger. we recognize that artificial intelligence is the hot new thing in a wide range of industries, including in librarianship, but it would behoove a respected scholarly journal to do its due diligence, rather than jump on the bandwagon. sincerely, tess amram special materials and continuing resources cataloging librarian university of colorado boulder tess.amram@colorado.edu robin goodfellow malamud cataloger and classifier i boston public library rmalamud@bpl.org cheryl hollingsworth cataloguing librarian university of dallas chollingsworth@udallas.edu notes 1 janelle shane, “neural networks, explained,” physics world, last modified july 9, 2018, https://physicsworld.com/a/neural-networks-explained/. 2 stephen wolfram, “what is chatgpt doing … and why does it work?”, stephen wolfram writings, last modified february 14, 2023, https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-itwork/. 3 natalie, “what is chatgpt?”, openai help center, accessed october 27, 2023, https://help.openai.com/en/articles/6783457-what-is-chatgpt. 4 richard brzustowicz, “from chatgpt to catgpt,” information technology and libraries 42, no. 3 (september 2023): 2, https://doi.org/10.5860/ital.v42i3.16295. 5 “rda registry | vocabulary,” american library association, accessed october 27, 2023, https://www.rdaregistry.info/termlist/rdacontenttype/. 6 brzustowicz, “catgpt,” 3. mailto:tess.amram@colorado.edu mailto:rmalamud@bpl.org mailto:chollingsworth@udallas.edu https://physicsworld.com/a/neural-networks-explained/ https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/ https://help.openai.com/en/articles/6783457-what-is-chatgpt https://doi.org/10.5860/ital.v42i3.16295 https://www.rdaregistry.info/termlist/rdacontenttype/ 24 information technology and libraries | march 2011 ruben tous, manel guerrero, and jaime delgado semantic web for reliable citation analysis in scholarly publishing nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, impersonation, and privacy of paper contents) and defects related to citation analysis, such as the following: ■■ nonidentical paper instances confusion ■■ author naming conflicts ■■ lack of machine-readable citation metadata ■■ fake citing papers ■■ impossibility for authors to control their related citation data ■■ impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as isi citation index, citeseer (http:// citeseer.ist.psu.edu/), and google scholar is the fact that they count multiple copies or versions of the same paper as many papers. in addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers.3 to remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. it is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. we believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that citation-analysis systems are acquiring in our society. ■■ reference architecture we have designed a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. this architecture is based in the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow. as a trust scheme, we have chosen a public key infrastructure (pki), in which certificates are signed by certification authorities belonging to one or more hierarchical certification chains.4 trust scheme the goal of the architecture is to allow citation-analysis systems to verify the provenance and trust of machinereadable metadata about citations before incorporating analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. this paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. the solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. i n recent years, the amount of scholarly communication brought into the digital realm has exponentially increased.1 this no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. the potential automation of the contribution– relevance calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. for example, one can find within articles of the spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the subject category listing of the journal citation reports of the science citation index.2 this example shows the growing relevance of these systems today. ruben tous (rtous@ac.upc.edu) is associate professor, manuel guerrero (guerrero@ac.upc.edu) is associate professor, and jaime delgado (jaime.delgado@ac.upc.edu) is professor, all in the departament d’arquitectura de computadors, universitat politècnica de catalunya, barcelona, spain. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 25 might send a signed notification of rejection. we feel that the notification of acceptance is necessary because in a certain kind of curriculum, evaluations for university professors conditionally accepted papers can be counted, and in other curriculums not. the camera-ready version will be signed by all the authors of the paper, not only the corresponding author like in the paper submission. after the camera-ready version of the paper has been accepted, the journal will send a signed notification of future publication. this notification will include the date of acceptance and an estimate date of publication. finally, once the paper has been published, the journal will send a signed notification of publication to the author. the reason for having both notification of future publication and notification of publication is that, again, some curriculum evaluations might be flexible enough to count papers that have been accepted for future publication, while stricter ones state explicitly that they only accept published papers. once this process has been completed, a citationanalysis system will only need to import the authors’ ca certificates (that is, the certificates of the universities, research centers, and companies) and the publishers’ ca certificates (like acm, ieee, springer, lita, etc.) to be able to verify all the signed information. a chain of cas will be possible both with authors (for example, university, department, and research line) and with publications (for example, publisher and journal). ■■ universal resource identifiers to ensure that authors’ uris are unique, they will have a tree structure similar to what urls have. the first level element of the uri will be the authors’s organization (be it a university or a research center) id. this organization id will be composed by the country code top-level domain (cctld) and the organization name, separated by an underscore.5 the citation-analysis system will be responsible for assigning these identifiers and ensuring that all organizations have different identifiers. then, in the same manner, each organization will assign second-level elements (similar to departments) and so forth. author’s ca_id: _ example: es_upc author ’s uri: author:/// . . . /. example: author://es_upc.dac/ruben.tous (in this example “es” is the cctdl for spain, upc (universitat politècnica de catalunya) is the university, and dac (departament d’arquitectura de computadors) is the department. them into their repositories. as a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–editing–publishing process (e.g., paper submission, paper acceptance, and paper publication). to achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. because our approach is based in a pki trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incorporates a digital signature to bind a public key with an identity. all the certificates used in the architecture will include the public key information of the subject, a validity period, the url of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. each author will have a certificate that will include as a subject-unique identifier the author ’s universal resource identifier (uri), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and previous information (list of former names, e-mails, and addresses), and a timestamp indicating when the certificate was generated. the certification authority (ca) of the author’s certificate will be the university, research center, or company with which the author is affiliated. the ca will manage changes in name, e-mail, and address by generating a new certificate in which the former certificate will move to the list of former information. changes in affiliation will be managed by the new ca, which will generate a new certificate with the current information. since the new certificate will have a new uri, the ca also will generate a signed link to the previous uri. therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. it will be the responsibility of the new ca to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the corresponding author will digitally sign a metadata graph describing the paper submission event. although the paper submission will only be signed by the corresponding author, it will include the uris of all the authors. journals (and also conferences and workshops) will have a certificate that contains their related information. their ca will be the organization or editorial board behind them (for instance, acm, ieee, springer, lita, etc.). if a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. if the paper is rejected, the journal 26 information technology and libraries | march 2011 ■■ microsoft’s conference management toolkit (cmt; http://cmt.research.microsoft.com) is a conference management service sponsored by microsoft research. it uses https to provide confidentiality, but it is a service for which you have to pay. although some of the web-based systems provide confidentiality through https, none of them provides nonrepudiation, which we feel is even more important. this is so because nonrepudiation allows authors to certify their publications to their curriculum evaluators. our proposed scheme always provides nonrepudiation because of its use of signatures. curriculum evaluators don’t need to search for the publisher’s website to find the evaluated author’s paper. in addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. and confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. it should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). and, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ modeling the scholarly communication process citation analysis systems operate over metadata about the scholarly communication process. currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstructured textual contents. these techniques have several drawbacks, as enumerated already, but especially regarding the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. to allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. this model is based on knowledge representation techniques in semantic web, such as resource description framework (rdf) graphs and web ontology language (owl) ontologies. metadata and rdf the term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data representation such as a physical book or a digital video file). metadata plays a privileged role in the scholarly creations’ uris are built in a similar manner to authors’ uris. but it this case, the use of the country code as part of the publisher’s id is optional. because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different uris for each phase. we propose the use of this kind of uri instead of other possible schemes such as the digital object identifier (doi), because the ones proposed in this paper has the advantage of being human readable and contain the cas chain.6 of course, that doesn’t mean that once published a paper cannot obtain a doi or another kind of identifier. publisher’s ca_id: or _ examples: lita and it_italianjournalofzoology creation’s uri: creation:// . . . / example: creation://lita.ital/vol27_num1_ paper124 confidentiality and nonrepudiation nowadays, some conferences manage their paper submissions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as edas (http://edas.info/). the e-mail-based system has no means of providing any kind of confidentiality. each router through which the e-mail travel can see their contents (paper submissions and paper reviews). the web-based system can provide confidentiality through http secure (https), although some of the most popular applications (such as edas and myreview) do not provide it; their developers may not have thought that it was an important feature. the following is a short list of some of the existing web-based systems: ■■ edas (http://edas.info/) is probably the most popular sytem. it can manage a large number of conferences and special issues of journals. it does not provide confidentiality. ■■ myreview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the gpl license for managing the paper submissions and paper reviews of a conference or journal. myreview is implemented with php and mysql. it does not provide confidentiality. ■■ conftool (http://www.conftool.net) is another web-based management system for conferences and workshops. a free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. it uses https to provide confidentiality. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing rdf graphs is to be used. the decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). owl and an ontology for the scholarly context to allow modeling the scholarly communication process with rdf graphs, we have designed an owl description logic (dl) ontology. owl is a vocabulary for describing properties and classes of rdf resources, complementing rdfs’s capabilities for providing semantics for generalization hierarchies of such properties and classes. owl enriches the rdfs vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enumerated classes. owl has the influence of more than ten years of dl research. this knowledge allowed the set of constructors and axioms supported by owl to be carefully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. a suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of owl on the sh family of description logics.10 the language has three increasingly expressive sublanguages designed for different uses: owl lite, owl dl, and owl full. we have chosen owl dl to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. with respect to the other versions of owl, owl dl offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). owl dl is so named because of its correspondence with description logics. figure 3 shows a simplified graphical view of the owl ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. figure 4, figure 5, and figure 6 offer a (partial) tabular representation of the main classes and properties of the ontology. in owl, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. for the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. uris do not appear as properties in the diagrams because each instance of a class will be an rdf resource, and any resource has a uri according to the rdf model. these uris will follow the rules described in the above section, “reference architecture.” it’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. because metadata are data, they can be represented through any the existing data representation models, such as the relational model or the xml infoset. though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. recently, a not-so-recent formalism has proliferated as a metadata representation model: rdf from the world wide web consortium (w3c).7 we have chosen rdf for modeling the citation lifecycle because of its advantages with respect to other formalisms. rdf is modular; a subset of rdf triples from an rdf graph can be used separately, keeping a consistent rdf model. it therefore can be used with partial information, an essential feature in a distributed environment. the union of knowledge is mapped into the union of the corresponding rdf graphs (information can be gathered incrementally from multiple sources). rdf is the main building block of the semantic web initiative, together with a set of technologies for defining rdf vocabularies like rdf schema (rdfs) and the owl.8 rdf comprises several related elements, including a formal model and an xml serialization syntax. the basic building block of the rdf model is the triple subjectpredicate-object. in a graph-theory sense, an rdf instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of submitting a paper for publication”) using rdf predicates. the example in figure 1 shows how the action of submitting a paper for publication could be modeled with an rdf graph. figure 2 shows how the example in figure 1 would be serialized using the rdf xml syntax (the abbreviated mode). so, in our approach, we model assertions as rdf graphs and subgraphs. to allow anybody (authors, publishers, citation-analysis systems, or others) to verify a chain of assertions, each involved rdf graph must be digitally signed by the proper principal. there are two approaches to signing rdf graphs (as also happens with xml instances). the first approach applies when the rdf graph is obtained from a digitally signed file. in this situation, one can simply verify the signature on the file. however, in certain situations the rdf graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. a second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 for 28 information technology and libraries | march 2011 note that instances of submitted and accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. on the other hand, instances of tobepublished and published event classes will point to different creation instances (pointed by the cameraready and publishedcreation properties) because of the final editorial-side modifications to which a work can be subject. ■■ advantages of the proposed trust scheme the following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: core (dc), dc’s scholarly works application profile, vcard, and bibtex.11 figure 4 shows the class publication and its subclasses, which represent the different kinds of publication. in the figure, we only show classes for journals, proceedings, and books. but it could obviously be extended to contain any kind of publication. figure 5 contains the classes for the agents of the ontology (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). the figure also includes the creation class (e.g., a paper or a book chapter). finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper acceptance, notification of future publication, and publication). figure 1. example rdf graph semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 29 cryptography. the necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generating and signing the rdf metadata would be required. plenty of rdf editors and digital signature toolkits exist, but we predict that conference and journal management systems such as edas could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. this could be transparent to the users because the rdf documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. because our approach is based on a pki trust scheme, we rely on a special setup assumption: the existence of cas, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. to get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a ca trusted by this citation-analysis system. the selection of trusted ■■ an author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ an evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ an author cannot forge notifications of publication. ■■ a publisher cannot repudiate the fact that it has published an article once it has sent the certificate. ■■ two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ implications the adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. regarding the technological impact, the approach relies on the use of semantic web technologies and public-key 2008–05–25 semantic web for reliable citation management in scholarly publishing . . . . . . figure 2. example rdf/xml representation of graph in figure 1 30 information technology and libraries | march 2011 figure 3. owl ontology for capturing the scholarly communication process figure 4. part of the ontology describing publications semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 31 the citation-analysis system obtains the information or whether the information is duplicated. the proposed approach guarantees that the citation-analysis subsystem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. our approach also implies minor behavioral changes for authors, mainly related to the management of publickey certificates, which is often required for many other tasks nowadays. a collateral benefit of the approach would be the automation of the copyright transfer procedure, which in most cases still relies on handwritten signatures. authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal management software would do all the work. cas by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. however, this process would be eased if some institutional cas belonged to trust hierarchies (e.g., national or regional), so including some higher-level cas makes the inclusion of cas of some small institutions easier. another technological implication is related to the interchange and storage of the metadata. users and publishers should save the signed metadata coming from a publishing process digitally, and citation-analysis systems should harvest the digitally signed metadata. the metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where figure 5. part of the ontology describing agents and creations 32 information technology and libraries | march 2011 domain, but which we have taken in consideration. in our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. we have chosen a public-key infrastructure (pki) design to cover such a requirement. however, other approaches exist, such as the one by khare and rifkin, which combines rdf with digital signatures in a manner related to what is known as the “web of trust.”13 one aspect of any approach dealing with rdf and cryptography is how to digitally sign rdf graphs. as described above, in the section “modeling the scholarly communication process with semantic web knowledge representation techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by carroll.14 ■■ conclusions the work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. the paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, including nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ related work as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. regarding the use of ontologies and semantic web technologies for modeling the scholarly domain, we highlight the research by rodriguez, bollen, and van de sompel.12 they define a semantic model for the scholarly communication process, which is used within an associated large-scale semantic store containing bibliographic, citation, and use data. this work is related to the mesur (metrics from scholarly usage of resources) project (http://www.mesur.org) from los alamos national laboratory. the project’s main goal is providing novel mechanisms for assessing the impact of scholarly communication items, and hence of scholars, with metrics derived from use data. as in our case, the approach by rodriguez, bollen, and van de sompel models static and dynamic aspects of the scholarly communication process using rdf and owl. however, contrary to what happens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by rodriguez, bollen, and van de sompel focuses on modeling the use of alreadypublished bibliographic resources. regarding the combination of semantic web technologies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly figure 6. part of the ontology describing events semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 33 isi web of knowledge, http://www.isiwebofknowledge .com/ (accessed june 24, 2010); and eugene garfield, citation indexing: its theory and application in science, technology and humanities (new york: wiley, 1979). 3. judit bar-ilan, “an ego-centric citation analysis of the works of michael o. rabin based on multiple citation indexes,” information processing & management: an international journal 42 no. 6 (2006): 1553–66. 4. alfred arsenault and sean turner, “internet x.509 public key infrastructure: pkix roadmap,” draft, pkix working group, sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkixroadmap-00 (accessed june 24, 2010). 5. internet assigned numbers authority (iana), root zone database, http://www.iana.org/domains/root/db/ (accessed june 24, 2010). 6. for information on the doi system, see bill rosenblatt, “the digital object identifier: solving the dilemma of copyright protection online,” journal of electronic publishing 3, no. 2 (1997). 7. resource description framework (rdf), world wide web consortium, feb. 10, 2004, http://www.w3.org/rdf/ (accessed june 24, 2010). 8. “rdf vocabulary description language 1.0: rdf schema. w3c working draft 23 january 2003,” http://www .w3.org/tr/2003/wd-rdf-schema-20030123/ (accessed june 24, 2010); “owl web ontology language overview. w3c recommendation 10 february 2004,” http://www.w3.org/tr/ owl-features/ (accessed june 24, 2010). 9. jeremy j. carroll, “signing rdf graphs,” in the semantic web—iswc 2003, vol. 2870, lecture notes in computer science, ed. dieter fensel, katia sycara, and john mylopoulos (new york: springer, 2003). 10. ian horrocks, peter f. patel-schneider, and frank van harmelen, “from shiq and rdf to owl: the making of a web ontology language” web semantics: science, services and agents on the world wide web 1 (2003): 10–11. 11. see the dublin core metadata initiative (dcmi), http:// dublincore.org/ (accessed june 24, 2010); julie allinson, pete johnston, and andy powell, “a dublin core application profile for scholarly works,” ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/eprints_type_vocabulary_ encoding_scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed dec. 27, 2010); world wide web consortium, “representing vcard objects in rdf/xml: w3c note 22 february 2001,” http://www.w3.org/tr/2001/note -vcard-rdf-20010222/ (accessed dec. 3, 2010); and for bibtex, see “entry types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed june 24, 2010). 12. marko. a. rodriguez, johan bollen, and herbert van de sompel, “a practical ontology for the large-scale modeling of scholarly artifacts and their usage,” proceedings of the 7th acm/ ieee joint conference on digital libraries (2007): 278–87. 13. rohit khare and adam rifkin, “weaving a web of trust,” world wide web journal 2, no. 3 (1997): 77–112. 14. carroll, “signing rdf graphs.” the architecture presented in this work is based in the use of digitally signed rdf graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-analysis systems could have access to independent reliable evidences. the architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the different parties in the scholarly domain. to allow modeling the scholarly communication process with rdf graphs, we have designed an owl dl ontology. rdf graphs carrying instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the scholarly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publishing lifecycle. we believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solutions and the relevance that citation-analysis systems are acquiring in our society ■■ acknowledgment this work has been partly supported by the spanish administration (tec2008-06692-c02-01 and tsi2007 66869-c02-01). references and notes 1. herbert van de sompel et al., “an interoperable fabric for scholarly value chains,” d-lib magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed jan. 19, 2011). 2. boletín oficial del estado (b.o.e.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ a07875–07887.pdf (accessed june 24, 2010). see also thomson usability study of a library’s m obile website: an example from portland state university kimberly d. pendell and michael s. bowman usability study of a library’s mobile website | pendell and bowman 45 abstract to discover how a newly developed library mobile website performed across a variety of devices, the authors used a hybrid field and laboratory methodology to conduct a usability test of the website. twelve student participants were recruited and selected according to phone type. results revealed a wide array of errors attributed to site design, wireless network connections, as well as phone hardware and software. this study provides an example methodology for testing library mobile websites, identifies issues associated with mobile websites, and provides recommendations for improving the user experience. introduction mobile websites are swiftly becoming a new access point for library services and resources. these websites are significantly different from full websites, particularly in terms of the user interface and available mobile-friendly functions. in addition, users interact with a mobile website on a variety of smartphones or other internet-capable mobile devices, all with differing hardware and software. it is commonly considered a best practice to perform usability tests prior to the launch of a new website in order to assess its user friendliness, yet examples of applying this practice to new library mobile websites are rare. considering the variability of user experiences in the mobile environment, usability testing of mobile websites is an important step in the development process. this study is an example of how usability testing may be performed on a library mobile website. the results provided us with new insights on the experience of our target users. in the fall of 2010, with the rapid growth of smartphones nationwide especially among college students, portland state university (psu) library decided to develop a mobile library website for its campus community. the library’s lead programmer and a student employee developed a test version of the website. this version of the website included library hours, location information, a local catalog search, library account access for viewing and renewing checked out items, and access to reference services. it also included a “find a computer” feature displaying the availability of work stations in the library’s two computer labs. kimberly d. pendell (kpendell@pdx.edu) is social sciences librarian, assistant professor, and michael s. bowman (bowman@pdx.edu) is interim assistant university librarian for public services, associate professor, portland state university library, portland, oregon. mailto:kpendell@pdx.edu mailto:bowman@pdx.edu information technology & libraries | june 2012 46 the basic architecture and design of the site was modeled on other existing academic library mobile websites that were appealing to the development team. the top-level navigation of the mobile website largely mirrored the full library website, utilizing the same language as the website when possible. the mobile website was built to be compatible with webkit, the dominant smartphone layout engine. use of javascript on the website was minimized due to the varying levels of support for it on different smartphones, and flash was avoided entirely. figure 1. home page of library mobile website, test version we formed a mobile website team to further evaluate the test website and prepare it for launch. three out of four team members owned smartphones, either an iphone 3gs or an iphone 4. we soon began questioning how the mobile website would work on other types of phones, recognizing that hardware and software differences would likely impact user experience of the mobile website. performing a formal usability test using a variety of internet-capable phones quickly became a priority. we decided to conduct a usability test for the new mobile website in order to answer the question: how user-friendly and effective is the new library mobile website on students’ various mobile devices? literature review smartphones, mobile websites, and mobile applications have dominated the technology landscape in the last few years. smartphone ownership has steadily increased, and a large percentage of usability study of a library’s mobile website | pendell and bowman 47 smartphone owners regularly use their phone to access the internet. the pew research center reports that 52 percent of americans aged 18–29 own smartphones, and 81 percent of this population use their smartphone to access the internet or e-mail on a typical day. additionally, 42 percent of this population uses a smartphone as their primary online access point.1 the 2010 ecar study of undergraduate students and information technology found that 62.7 percent of undergraduate students own internet-capable handheld devices, an increase of 11.5 percent from 2009. the 2010 survey also showed that an additional 11.3 percent of students intended to purchase an internet-capable handheld device within the next year.2 in this environment academic libraries have been scrambling to address the proliferation of student owned mobile devices, thus the number of mobile library websites is growing. the library success wiki, which tracks libraries with mobile websites, shows an 66 percent increase in the number of academic libraries in the united states and canada with mobile websites from august 2010 to august 2011.3 we reviewed articles about mobile websites in the professional library science literature and found that mobile website usability testing is only briefly mentioned. in their summary of current mobile technologies and mobile library website development, bridges, rempel, and griggs state that “user testing should be part of any web application development plan. you can apply the same types of evaluation techniques used in non-mobile applications to ensure a usable interface.”4 in a previous article, the same authors also note that not accounting for other types of mobile users is easy to do but leaves a potentially large audience for a mobile website “out in the cold.”5 more recently, seeholzer and salem found the usability aspect of mobile website development to be in need of further research.6 usability evaluation techniques for a mobile website are similar to those for a full website, but the variety of smartphones and internet-capable feature phones immediately complicates standard usability testing practices. the mobile device landscape is fraught with variables that can have a significant impact on the user experience of a mobile website. factors like small screen size, processing power, wireless or data plan connection, and on-screen keyboards or other data entry methods contribute to user experience and impact usability testing. zhang and adipat note that, mobile devices themselves, due to their unique, heterogeneous characteristics and physical constraints, may play a much more influential role in usability testing of mobile applications than desktop computers do in usability testing of desktop applications. therefore real mobile devices should be used whenever possible.7 one strategy for usability testing on mobile devices is to identify device “families” by similar operating systems or other characteristics, then perform a test of the website. for example, griggs, bridges, and rempel found representative models of device families at a local retailer, where they tested the site on the display phones. the authors also recommend “hallway usability testing,” an impromptu test with a volunteer.8 zhang and adipat go on to outline two methodologies for formal mobile application usability testing: field studies and laboratory experiments. the benefit of a mobile usability field study is information technology & libraries | june 2012 48 the preservation of the mobile environment in which tasks are normally performed. however, data collection is challenging in field studies, requiring the participant to reliably and consistently selfreport data. in contrast, the benefit of a laboratory study is that researchers have more control over the test session and data collection method. laboratory usability tests lend themselves to screen capture or video recording, allowing researchers more comprehensive data regarding the participant’s performance on predetermined tasks.9 however, billi and others point out that there is no general agreement in the literature about the significance or usefulness of the difference between laboratory and field testing of mobile applications.10 one compromise between field studies and laboratory experiments is the use of a smartphone emulator: an emulator mimics the smartphone interface on a desktop computer and is recordable via screen capture. however, desktop emulators mask some usability problems that impact smartphones, such as an unstable wireless connection or limited bandwidth.11 in order to record test sessions of users working directly with mobile devices, jakob nielsen, the well-known usability expert, briefly mentions the use of a document camera.12 in another usability test of a mobile application, loizides and buchanan also used a document camera with recording capabilities to effectively record users working with a mobile device.13 usability attributes are metrics that help assess the user-friendliness of a website. in their review of empirical mobile usability studies, coursaris and kim present the three most commonly used measures in mobile usability testing: efficiency: degree to which the product is enabling the tasks to be performed in a quick, effective and economical manner or is hindering performance; effectiveness: accuracy and completeness with which specified users achieved specified goals in particular environment; satisfaction: the degree to which a product is giving contentment or making the user satisfied.14 the authors present these measures in an overall framework of “contextual usability” constructed with the four variables of user, task, environment, and technology. an important note is the authors’ use of technology rather than focusing solely on the product; this subtle difference acknowledges that the user interacts not only with a product, but also other factors closely associated with the product, such as wireless connectivity.15 a participant proceeding through a predetermined task scenario is helpful in assessing site efficiency and effectiveness by measuring the error rate and time spent on a task. user satisfaction may be gauged by the participant’s expression of satisfaction, confusion, or frustration while performing the tasks. measurement of user satisfaction may also be supplemented by a post-test survey. returning to general evaluation techniques, mobile website usability employs the use of task scenarios, post-test surveys, and data analysis methods, similar to full site testing. general guides such as the handbook of usability testing by rubin and chisnell and george’s user-centered library websites: usability evaluation methods provide helpful information on designing task scenarios, how to facilitate a test, post-test survey ideas, and methods of analysis.16 another usability study of a library’s mobile website | pendell and bowman 49 common data collection method in usability testing is the think aloud protocol as it allows researchers to more fully understand the user experience. participants are instructed to talk about what they are thinking as they use the site; for example, expressing uncertainty of what option to select, frustration with poorly designed data entry fields, or satisfaction with easily understood navigation. examples of the think aloud protocol can also be found in mobile website usability testing.17 method while effective usability testing normally relies on five to eight participants, we decided a larger number of participants would be needed in order to capture the behavior of the site on a variety of devices. therefore, we recruited twelve participants to accommodate a balanced variety of smartphone brands and models. based on average market share, we aimed to test the website on four iphones, four android phones, and four other types of smartphones or internet-capable mobile devices (e.g., blackberry, windows phones). all study participants were university students, the primary target audience of the mobile website. we used three methods to recruit participants: a post to the library’s facebook page, a news item on the library’s home page, and two dozen flyers posted around campus. each form of recruitment described an opportunity for students to spend less than thirty minutes helping the library test its new mobile website. also, participants would receive a $10 coffee shop gift card as an incentive. a project-specific email address served as the initial contact point for students to volunteer. we instructed volunteers to indicate their phone type in their e-mail; this information was used to select and contact the students with the desired variety of mobile devices. if a scheduled participant did not come to the test appointment, another student with the same or similar type of phone was contacted and scheduled. no other demographic data or screening was used to select participants, aside from a minimum age requirement of eighteen years old. we employed a hybrid field and laboratory test protocol, which allowed us to test the mobile website on students’ native devices while in a laboratory setting that we could efficiently manage and schedule. participants used their own phone for the test without any adjustment to their existing operating preferences, similar to field testing methodology. however, we used a controlled environment in order to facilitate the test session and create recordings for data analysis. a library conference room served as our laboratory, and a document camera with video recording capability was used to record the session. the document camera was placed on an audio/visual cart and the participants chose to either stand or sit while holding their phones under the camera. the document camera recorded the phone screen, the participant’s hands, and the audio of the session. the video feed was available through the room projector as well, which helped us monitor image quality of the recordings. information technology & libraries | june 2012 50 figure 2. video still from test session recording the test session consisted of two parts: the completion of five tasks using participants’ phones on our test website recorded under the document camera, and a post-test survey. participants were read an introduction and instructions from a script in order to decrease variation in test protocol and our influence as the facilitators. we also performed a walk-through of the testing session prior to administering it to ensure the script was clearly worded and easy to understand. we developed our test scenarios and tasks according to five functional objectives for the library mobile website: 1. participants can find library hours for a given day in the week. 2. participants can perform a known title search in catalog and check for item status. 3. participants can use my account to view checked out books.18 4. participants can use chat reference. 5. participants can effectively search for a scholarly article using the mobile version of ebscohost academic search complete. prior to beginning the test, we encouraged participants to use the “think aloud” protocol while performing tasks. we also instructed them to move between tasks however they would naturally in order to capture user behavior when navigating from one part of the site to another. the post-test survey provided us with additional data and user reactions to the site. users were asked to rate the site’s appearance, ease of use, and how frequently they might use the different website features usability study of a library’s mobile website | pendell and bowman 51 (e.g., renewing a checked out item). the survey was administered directly after the task scenario portion of the test in order to take advantage of the users’ recent experience with the website. we evaluated the test sessions utilizing the measures of efficiency, effectiveness, and satisfaction. in this study, we assessed efficiency as time spent performing the task and effectiveness as success or failure in completing the task. we observed errors and categorized them as either a user error or site error. each error was also categorized as minor, major, or fatal: minor errors were easily identified and corrected by the user; major errors caused a notable delay, but the user was able to correct and complete the task; fatal errors prevented the user from completing the task. to assess user satisfaction, we took note of user comments as they performed tasks, and we also referred to their ratings and comments on the post-test survey. before analyzing the test recordings, we normalized our scoring behavior by performing a sample test session with a library staff member unfamiliar with the mobile website. we scored the sample recording separately and then met to discuss, clarify, and agree upon each error category. each of the twelve test sessions was viewed and scored independently. once this process was completed, we discussed our scoring of each test session video, combining our data and observations. we analyzed the combined data by looking for both common and unique errors for each usability task across the variety of smartphones tested. to protect participants’ confidentiality, each video file and post-test survey was labeled only with the test number and device type. prior to beginning the study, all recruitment methods, informed consent, methodology, tasks and post-test survey were approved by portland state university human subjects research and review committee. findings our recruitment efforts were successful with even a few same-day responses from the announcement posted on the library’s facebook page. some students also indicated that they had seen the recruitment flyers on campus. a total of fifty-two students volunteered to participate; twelve students were successfully contacted, scheduled, and tested. the distribution of the twelve participants and their types of phones is shown in table 1. number of participants operating system phone model 4 android htc droid incredible 2; motorola droid; htc mytouch 3g slide; motorola cliq 2 3 ios iphone 3gs 2 blackberry blackberry 9630; blackberry curve information technology & libraries | june 2012 52 1 windows phone 7 windows phone 7 1 webos palm pixi 1 other windows kin 2 feature phone (a phone with internet capability, running kinos) table 1. test participants by smartphone operating system and model usability task scenarios all test participants quickly and successfully completed the first task, finding the library hours for sunday. the second task was to find a book in the library catalog and report whether the book was available for check out. nine participants completed this task; the windows phone 7 and the two blackberry phones presented a fatal system error when working with our mobile catalog software, mobilecat. these participants were able to perform a search but were not able to view a full item record, blocking them from seeing the item’s availability and completing the task. this task also revealed one minor error for iphone users: the iphone displayed the item’s ten digit isbn as a phone number, complete with touch-to-call button. many users took more time than anticipated when asked to search for a book. the video recordings captured participants slowly scrolling through the menu before choosing “search psuonly catalog.” a few participants expressed their hesitation verbally: ● “maybe not the catalog? i don't know. yeah i guess that would be the one.” ● “i don't look for books on this site anyway...my lack of knowledge more than anything else.” ● “search psu library catalog i'm assuming?” the blackberry curve participant did not recognize the catalog option and selected “databases & articles” to search for a book. she was guided back to the catalog after her unsuccessful search in ebscohost. we observed an additional delay in searching for a book when using the catalog interface. the catalog search included a pull down menu of collections options. the collections menu was included by the site developers because it is present in the full website version of the local catalog. users tended to explore the menu looking for a selection that would be helpful in performing the task; however, they abandoned the menu, occasionally expressing additional confusion. usability study of a library’s mobile website | pendell and bowman 53 figure 3. catalog search with additional “collections” menu the next task was to log into a library account and view checked out items. all participants were successful with this task, but frequent minor user errors were observed, all misspelling or numerical entry errors. most participants self-corrected before submitting the login; however, one participant submitted a misspelled user name and promptly received an error message from the site. participants were also instructed to log out of the account. after clicking “logout” one participant made the observation; “huh, it goes to the login screen. i assume i'm logged out, though it doesn't say so.” the fourth task scenario involved using the library’s chat reference service via the mobile website. the chat reference service is provided via open source software in cooperation with l-net, the oregon statewide service. usability testing demonstrated that the chat reference service did not perform well on a variety of phones. also, a significant problem arose when participants attempted to access chat reference via the university’s unsecured wireless network. because the chat reference service is managed by a third-party host, three participants were confronted with a non-mobile friendly authentication screen (see discussion of the local wireless environment below). as this was an unexpected event in testing, participants were given the option to authenticate or abandon the task. all three participants who arrived at this point chose to move ahead with authentication during the test session. information technology & libraries | june 2012 54 once the chat interface was available to participants, other system errors were discovered. only three out of twelve participants successfully sent and received a chat message. only one participant (htc droid incredible) experienced an error-free chat transaction. various problems encountered included: · unresponsive or slow to respond buttons, · text fields unresponsive to data entry, · unusually long page loading time, · non-mobile-friendly error message upon attempting to exit, and · non-mobile-friendly “leave a message” webpage. another finding from this task is that participants expressed concern regarding communication delays during the chat reference task. if the librarians staffing the chat service are busy with other users, a new incoming user is placed in a queue. after waiting in the chat queue for forty seconds, one participant commented, “probably if i was on the bus and it took this long, i would leave a message.” being in a controlled environment, participants looked to the facilitator as a guide for how long to remain in the chat queue, distorting the indication of how long users would wait for a chat reference transaction in the field environment. figure 4. chat reference queue usability study of a library’s mobile website | pendell and bowman 55 the last task scenario asked participants to use the mobile version of ebscohost’s academic search complete. our test instance of this database generally performed well with android phones and less well with webos phones or iphones. android participants successfully accessed, searched, and viewed results in the database. iphone users experienced delays in initiating text entry, three consecutive touches being consistently necessary to activate typing in the search field. our feature phone participant with a windows kin 2 was unable to use ebscohost because the phone’s browser, internet explorer 6, is not supported by the ebscohost mobile website. the palm pixi participant also experienced difficulty with very long page loading times, two security certificate notifications (not present on other tests), and our ezproxy authentication page. with all these obstacles, the palm pixi participant abandoned the task. another participant, blackberry 9630, also abandoned the task due to slow page loading. a secondary objective of our ebscohost search task was to observe if participants explored ebscohost’s “search options” in order to limit results to scholarly articles. our task scenario asked participants to find a scholarly article on global warming. only one participant explored the ebscohost interface, successfully identified the “search options” menu, and limited the results to “scholarly (peer reviewed) articles.” another participant included the words “peer reviewed” with “global warming” in the search field in an attempt to add the limit. a third expressed the need to limit to scholarly articles but was unable to discover how to do so. of the remaining seven participants who searched academic search complete for the topic “global warming” none expressed concern or awareness of the scholarly limit in academic search complete. it is unclear whether this was a product of the interface design, users’ lack of knowledge regarding limiting their search to scholarly sources, or if our task scenario was simply too vague. though participants’ wireless configurations, or lack thereof, was not formally part of the usability test, we quickly discovered that this variable had a significant impact on the user’s experience of the mobile website. in the introductory script and informed consent we recommended to participants that they connect to the university’s wireless network to avoid data charges. however, we did not explicitly instruct users to connect to the secure network. most participants chose to connect to the unencrypted wireless network and appeared to be unaware of the encrypted network (psu and psu secure respectively). using the unencrypted network led to authentication requirements at two different points in the test: using the chat service and searching academic search complete. other users who were unfamiliar with adding a wireless network to their phone used their cellular network connection. these participants were asked to authenticate only when accessing ebscohost’s academic search complete (see table 2). participants expressed surprise at the appearance of an authentication request when performing different tasks, particularly while connected to the on-campus university wireless network. the required data entry in a non-mobile friendly authentication screen, and the added page loading time, created an obstacle for the participant to overcome in order to complete the task. notably, three participants also explained their naivete on how to find and add a wireless network to their phone. information technology & libraries | june 2012 56 internet connection library mobile website chat reference ebscohost on campus, unencrypted wireless no authentication required authentication required authentication required on campus, encrypted wireless no authentication required no authentication required no authentication required on campus, cellular network no authentication required no authentication required authentication required off campus, any mode no authentication required no authentication required authentication required table 2. authentication requirements based on type of internet connection and resource. post -test survey each participant completed a post-test survey that asked them to rate the mobile website’s appearance and ease of use. the survey also asked participants to rank how frequently they were likely to use specific features of the website such as search for books and ask for help on a rating scale of more than weekly, weekly, monthly, less than monthly, and never. participants were also invited to add general comments about the website. the mobile website’s overall appearance and ease of use was highly rated by all participants. the straightforward design of the mobile website’s homepage also garnered praise in the comment section of the post-test survey. comments regarding the site’s design included: “very simple to navigate,” and “the simple homepage is perfect! also, i love that the site rotates sideways with my phone.” for many of the features listed on the survey participants selected an almost even distribution across the frequency of use rating scale. however, two features were ranked as having potential for very high use. nine out of twelve participants said they would search for articles weekly or more than weekly. eight out of twelve participants said they would use the “find a computer” function weekly or more than weekly. two participants additionally wrote in comments that “find a computer” was “very important” and would be used “every day.” at the other end of the scale, our menu option “directions” was ranked as having a potential frequency of use of never, with the exception of one participant marking less than monthly. discussion usability testing of the library’s mobile website provided the team with valuable information, leading us to implement important changes before the site was launched. we quickly decided on a usability study of a library’s mobile website | pendell and bowman 57 few changes, while others involved longer discussion. the collections menu was removed from the catalog search; this menu distracted and confused users with options that were not useful in a general search. “directions” was moved from a top level navigation element to a clickable link in the site footer. also, the need for a mobile version of the library’s ezproxy authentication page was clearly documented and has since been created and implemented. however, the team was very pleased with the praise for the overall appearance of the website and its ease of use, especially considering the significant difficulties some participants faced when completing specific tasks. the “find a computer” feature of the mobile website was very popular with test participants. the potential popularity among users is perhaps a reflection of overcrowded computer labs across campus and the continued need students have for desktop computing. unfortunately, “find a computer” has been temporarily removed from the site due to changes in computer laboratory tracking software at the campus it level. we hope to soon again have access to the workstation data for the library’s two computer labs in order to develop a new version of this feature. the hesitation participants displayed when selecting the catalog option in order to search for a book was remarkable for its pervasiveness. it’s possible that the term “catalog” has declined in use to the point of not being recognizable to some users, and it is not used to describe the search on the homepage of the library’s full website. in fact, we had originally planned to name the catalog search option with a more active and descriptive phrase, such as “find books and more,” which is used on the library’s full website. however, the full library website employs worldcat local, allowing users to make consortial and interlibrary loan requests. in contrast, the mobile website catalog reflects only our local holdings and does not support the request functionality. the team decided not to potentially confuse users further regarding the functionality of the different catalogs by giving them the same descriptive title. in the case that worldcat local’s beta mobile catalog increases in stability and functionality, we will abandon mobilecat and provide the same request options on the mobile website as on the full website. we discussed removing the chat service option from the “ask us” page. during usability testing, it was demonstrated that users would too frequently have poor experiences using this service due to slow page loads on most phones, the unpredictable responsiveness of text entry fields and buttons, and the wait time for a librarian to begin the chat. also, it could be that waiting in a virtual queue on a mobile device is particularly unappealing because the user is blocked from completing other tasks simultaneously. the library recently implemented a new text reference service, and this service was added to the mobile website. the text reference service is an asynchronous, non-webbased service that is less likely to pose similar usability problems as those found with the chat service. this reflects the difference between applications developed for desktop computing, such as web-based instant messaging, versus a technology that is specifically related to the mobile phone environment, like text messaging. however, tablet device users complicate matters since they might use the full desktop website or the mobile website; for this reason, chat reference is still part of the mobile website. information technology & libraries | june 2012 58 participants’ interest in accessing and searching databases was notable. during the task, many participants expressed positive reactions to the availability of the ebscohost database. the posttest survey results demonstrated a strong interest in searching for articles via the mobile website, giving their potential frequency of use as weekly or more than weekly. this evidence supports the previous user focus group results of seeholzer and salem.19 students are interested in accessing research databases on their mobile devices, despite the likely limitations of performing advanced searches and downloading files. therefore, the team decided to include ebscohost’s academic search complete along with eight other mobile-friendly databases in the live version of the website launched after the usability test. figure 5. home page of the library mobile website, updated usability study of a library’s mobile website | pendell and bowman 59 the new library mobile website was launched in the first week of fall 2011 quarter classes. in the first full week there were 569 visits to the site. site analytics for the first week also showed that our distribution of smartphone models in usability testing was fairly well matched with the users of the website, though we underestimated the number of iphone users: 64 percent of visits were from apple ios users, 28 percent from android users, 0.7percent blackberry users, and the remaining a mix of users with alternative mobile browsers and desktop browsers. usability testing with participants’ native smartphones and wireless connectivity revealed issues which would have been absent in a laboratory test that employed a mobile device emulator and a stable network connection. the complications introduced by the encrypted and unencrypted campus wireless networks, and cellular network connections, revealed some of the many variables users might experience outside of a controlled setting. ultimately, the variety of options for connecting to the internet from a smartphone, in combination with the authentication requirements of licensed library resources, potentially adds obstacles for users. general recommendations for mobile library websites that emerged from our usability test include: · users appreciate simple, streamlined navigation and clearly worded labels; · error message pages and other supplemental pages linked from the mobile website pages should be identified and mobile-friendly versions created; · recognize that how users connect to the mobile website is related to their experience using the site; · anticipate problems with third-party services (which often cannot be solved locally). additionally, system responses to user actions are important; for example, provide a “you have successfully logged out” message and an indicator that a catalog search is in progress. it is possible that users are even more likely to abandon tasks in a mobile environment than in a desktop environment if they perceive the site to be unresponsive. as test facilitators, we experienced three primary difficulties in keeping the testing sessions consistent. the unexpectedly poor performance of the mobile website on some devices required us to communicate with participants about when a task could be abandoned. for example, after one participant made three unsuccessful attempts at entering text data in the chat service interface, she was directed to move ahead to the next task. such instances of multiple unsuccessful attempts were considered to be fatal system errors. however, under these circumstances, it is difficult to know whether our test facilitation led participants to spend more or less time than they normally would attempting a task. secondly, the issue of system authentication led to unexpected variation in testing. some participants proceeded through these obstacles, while others either opted out or had significant enough technical difficulties that the task was deemed a fatal error. again, it is unclear how the average user would deal with this situation in the field. some users information technology & libraries | june 2012 60 might leave an activity if an obstacle appears too cumbersome, others might proceed. finally, participants demonstrated a wide range in their willingness to “think aloud.” in retrospect, as facilitators, we should have provided an example of the method before beginning the test; perhaps doing so would have encouraged the participants to speak more freely. the relatively simple nature of most of the test tasks may have also contributed to this problem as participants seemed reluctant to say something that might be considered too obvious. another limitation of our study is that the participants were a convenience sample of volunteers selected by phone type. though our selection was based loosely on market share of different smartphone brands, a preliminary investigation into the mobile device market of our target population would have been helpful to establish what devices would be most important to test. additional usability testing on more complex library related tasks, such as advanced searching in a database, or downloading and viewing files, is recommended for further research. also of interest would be a study of user willingness to proceed past obstacles like authentication requirements and non-mobile friendly pages in the field. conclusion we began our study questioning whether or not different smartphone hardware and operating systems would impact the user experience of our library’s new mobile website. usability testing confirmed that the type of smartphone does have an impact on the user experience, occasionally significantly so. by testing the site on a range of devices, we observed a wide variation of successful and unsuccessful experiences with our mobile website. the wide variety of phones and mobile devices in use makes developing a mobile website that perfectly serves all of them difficult; there is likely to always be a segment of users who experience difficulties with any given mobile website. however, usability testing data and developer awareness of potential problems will generate positive changes to mobile websites and alleviate frustration for many users down the road. references and notes 1. aaron smith, “35% of american adults own a smartphone: one quarter of smartphone owners use their phone for most of their online browsing,” pew research center, june 15, 2011, http://pewinternet.org/~/media//files/reports/2011/pip_smartphones.pdf (accessed oct. 13, 2011). 2. shannon d. smith and judith b. caruso, the ecar study of undergraduate students and information technology, 2010, educause, 2010, 41, http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed sept. 12, 2011); shannon d. smith, gail salaway, and judith b. caruso, the ecar study of undergraduate students and information technology, 2009, educause, 2009, 49, http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 (accessed sept. 12, 2011). http://pewinternet.org/~/media/files/reports/2011/pip_smartphones.pdf http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 usability study of a library’s mobile website | pendell and bowman 61 3. a comparison count of u.s. and canadian academic libraries with active mobile websites, wiki page versions, august 2010 (56 listed) and august 2011 (84 listed). library success: a best practices wiki, “m-libraries: libraries offering mobile interfaces or applications,” http://libsuccess.org/index.php?title=m-libraries (accessed sept. 7, 2011). 4. laurie m. bridges, hannah gascho rempel, and kim griggs, “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 317, doi:10.1108/00907321011045061. 5. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mmobile: tips on designing and developing mobile web sites,” code4lib journal no. 8 (2009), under “content adaptation techniques,” http://journal.code4lib.org/articles/2055 (accessed sept. 7, 2011). 6. jamie seeholzer and joseph a. salem jr., “library on the go: a focus group study of the mobile web and the academic library,” college & research libraries 72, no. 1 (2011): 19. 7. dongsong zhang and boonlit adipat, “challenges, methodologies, and issues in the usability testing of mobile applications,” international journal of human-computer interaction 18, no. 3 (2005): 302, doi:10.1207/s15327590ijhc1803_3. 8. griggs, bridges, and rempel, “library/mobile.” 9. zhang and adipat, “challenges, methodologies,” 303–4. 10. billi et al., “a unified methodology for the evaluation of accessibility and usability of mobile applications,” universal access in the information society 9, no. 4 (2010): 340, doi:10.1007/s10209-009-0180-1. 11. zhang and adipat, “challenges, methodologies,” 302. 12. jakob nielsen, “mobile usability,” alertbox, september 26, 2011, www.useit.com/alertbox/mobile-usability.html (accessed sept. 28, 2011). 13. fernando loizides and george buchanan, “performing document triage on small screen devices. part 1: structured documents,” in iiix ’10: proceeding of the third symposium on information interaction in context, ed. nicholas j. belkin and diane kelly (new york: acm, 2010), 342, doi:10.1145/1840784.1840836. 14. constantinos k. coursaris and dan j. kim, “a qualitative review of empirical mobile usability studies” (presentation, twelfth americas conference on information systems, acapulco, mexico, august 4–6, 2006), 4, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf (accessed sept. 7, 2011) 15. ibid., 2. http://libsuccess.org/index.php?title=m-libraries http://journal.code4lib.org/articles/2055 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.useit.com/alertbox/mobile-usability.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf information technology & libraries | june 2012 62 16. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); carole a. george, user-centered library web sites: usability evaluation methods (cambridge: chandos, 2008). 17. ronan hegarty and judith wusteman, “evaluating ebscohost mobile,” library hi tech 29, no. 2 (2011): 323–25, doi:10.1108/07378831111138198; robert c. wu et al., “usability of a mobile electronic medical record prototype: a verbal protocol analysis,” informatics for health & social care 33, no. 2 (2008): 141–42, doi:10.1080/17538150802127223. 18. in order to protect participants’ confidentiality a dummy library user account was created; the user name and password for the account were provided to the participant at the test session. 19. seeholzer and salem, “library on the go,” 14. letter from the editor (june 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15225 editorial board update i would like to open with a message of gratitude to the editorial board members who have helped shape the direction and focus of the journal over the past four years. steve bowers, kevin ford, cinthya ippoliti, ida joiner, michael sauers, and laurie willis have been fantastic colleagues, providing sage advice and thoughtful opinions through their tenures. together, they have reviewed dozens of articles for the journal but, more importantly, have helped shape the policies and directions we hope to take. together, we thought through and instituted our name change policy, a policy for revision of published articles, and ongoing efforts to identify sources of bias in editorial and reviewing practice. this work lays the foundation for future improvements. even as we say farewell to these editorial board members, it is my pleasure to welcome these individuals to the editorial board on july 1: ashlea green, mary a. guillory, dana haugh, shanna hollich, and cynthia schwarz. they were selected from an impressive pool of applicants. we are grateful for all who applied. we welcome submissions related to the intersection of cultural memory institutions (libraries, archives, and museums) and technology. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. this issue’s contents the june “public libraries leading the way” column is contributed by julie lane at the county of prince edward public library and archives. lane describes how the covid-19 pandemic not only led to immediate changes to serve a geographically distributed community, but also increased the library’s horizons in terms of advocating for and promoting equitable access to learning materials . our peer-reviewed content this month showcases topics including collection analysis, userlearner profiles, topic modeling, copyright bots, intangible cultural heritage, contactless services, and explainable artificial intelligence. 1. rarely analyzed: the relationship between digital and physical rare books collections / allison mccormack and rachel wittmann 2. ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities / marilou kordahi 3. applying topic modeling for automated creation of descriptive metadata for digital collections / monika glowacka-musial 4. classical musicians v. copyright bots: how libraries can aid in the fight / adam eric berkowitz 5. research on knowledge organization of intangible cultural heritage based on metadata / qing fan, guoxin tan, chuanming sun, and panfeng chen 6. contactless services: a survey of the practices of large public libraries in china / yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu 7. explainable artificial intelligence (xai): adoption and advocacy / michael ridley kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/13415 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/14027 https://ejournals.bc.edu/index.php/ital/article/view/14093 https://ejournals.bc.edu/index.php/ital/article/view/14141 https://ejournals.bc.edu/index.php/ital/article/view/14683 mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com editorial board update this issue’s contents examining attributes of open standard file formats for long-term preservation and open access eun g.park and sam oh information technology and libraries | december 2012 44 abstract this study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-formatselection criteria. a comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. the findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. these attributes appear to be closely related. additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. introduction file format is one of the core issues in the fields of digital content management and digital preservation. as many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. some file formats appear to be more widely accepted: tagged image file format (tiff), portable document format (pdf), pdf/a, office open xml (ooxml), and open document format (odf), to name a few. many institutions, including the library of congress (lc), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 the format descriptions database of the global digital format registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the pronom technical registry is another such database).2 despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“which selection criteria are appropriate?”) to more specific (“are these international standard file formats sufficient for us to ensure long term preservation and access?” or “how should we define and implement standard file formats in harmony with our local context?”). in this study, we investigate the definitions and features of standard file formats and examine the eun g. park (eun.park@mcgill.ca) is associate professor, school of information studies, mcgill university, montreal, canada. sam oh (samoh@skku.edu) is corresponding author and professor, department of library and information science, sungkyunkwan university, seoul, korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu information technology and libraries | december 2012 45 major attributes of assessing file formats. we discuss relevant issues from the viewpoint of openstandard file formats for long-term preservation and open access. background on standard file formats the term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 according to interpares 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 the premis data dictionary for preservation metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 in general, file format can be divided into two types: an access format and a preservation format. an access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 in comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 while the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. many researchers have discussed file formats and long-term preservation in relation to various types of resources. for example, folk and barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 the evaluation by rauch, krottmaier, and tochtermann illustrates the practical use of file formats for 3d objects in terms of long-term reliability.11 others have developed and/or applied numerous criteria in different settings. for instance, sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of pdf)/a from an archival and records management prospective.12 sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. rauch, krottmaier, and tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). rog and van wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 they identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. sahu focuses on the criteria developed by the uk’s national archives, which include open standards, ubiquity, stability, metadata support, feature set, examining attributes of open standard file formats for long-term preservation and open access | park and oh 46 interoperability, and viability.14 a more comprehensive evaluation by the lc reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 hodge and anderson use seven criteria for sustainability, which are similar to the technical factors of the lc study: disclosure, adoption, transparency, selfdocumentation, external dependencies, impact of patents, and technical protection mechanisms.16 some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. according to the david project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 standard may be too general to specify the elements of file formats. however, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the international standardization organization [iso], international industry imaging association [i3a], www consortium, etc.) are genuine standard file formats. for example, iso has announced several standard file formats for images: tiff/it (iso 12639:2004), png (iso/iec 15948:2004), and jpeg 2000 (iso/iec 15444:2003, 2004, 2005, 2007, 2008). for document file formats, pdf/a-1 (iso standard 19005-1. document file format for long-term preservation) is one example. this format is proprietary to maintain archival and recordsmanagement requirements and to preserve the visual appearance and migration needs of electronic documents. office open xml file format (iso/iec 29500–1:2008. information technology—document description and processing languages) is another open standard that can be implemented from microsoft office applications on multiple platforms. odf (iso/iec 26300:2006. information technology—open document format for office applications [opendocument] v1.0) is an xml-based open file format. regardless of iso-announced standards, some errors in these file formats have been reported. for example, although pdf/a-1 is for longterm preservation of and access to documents, studies reveal that the feature-rich nature of pdf can create difficulties in preserving pdf information over time.18 to overcome the barriers of pdf and pdf/a-1, xml technology seems prevalent for digital resources in archiving systems and digital preservation.19 the digital repository community is treating xml technology as a panacea and converting most of their digital resources to xml. the netherlands institute for scientific information service (nisis) adopts another noteworthy definition of standard file formats. it observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 this definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. open standard is another relevant term to consider in file formats. although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 in other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. information technology and libraries | december 2012 47 to follow the national archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 in comparison, the folk and barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 on a more specific level, stanescu emphasizes independence as the basic selection criteria for file formats.24 others, such as todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 other factors considered by todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (ip) and rights management.26 echoing the lc, hodge and anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 close examination of the nisis definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. the three file formats that iso has announced (pdf/a, ooxml, and odf) are proprietary and sometimes costly. they also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. the iso-announced file formats, in short, are only standard file formats, not open standard file formats. for cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. there exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. objectives of the study in this study, we attempt to better define and establish open-standard file-format-selection criteria. to that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. method we performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. we included literature that deals with the three standard file formats (pdf, pdf/a, and xml) but excluded the recently announced odf format due to the scarcity of literature on odf. among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. all of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). the original definitions or descriptions that we used are listed in the second column. the file formats that we assessed by their attributes are examining attributes of open standard file formats for long-term preservation and open access | park and oh 48 listed in the third column. when we give attributes without specific definitions or descriptions, “no definite term” is inserted. findings as illustrated in the appendix, the criteria identified by the studies vary. although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. first, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 it is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. to preserve document formatting, a “published view” of a given piece of content is critical for distribution. other content, such as database information or device-specific documents, needs to be preserved as well. functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. metadata can be expressed as metadata support, self-documentation (selfdocumenting), documentation, content-level (as opposed to presentation-level) description, selfdescribing, self-describing files, formal description of format, etc. third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. openness also may refer to the autonomy of a file format, which relies on several factors. first, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. self-containment does not necessarily mean that an archivist will only have one document to deal with. it does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. openness is expressed as open availability by some researchers.30 other researchers adopt the term disclosure for expressing that specification is publicly available.31 fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. this aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. this also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. having documents in a proprietary format controlled by a third party information technology and libraries | december 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. this fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. blocking access to a document with a password can lead to serious problems if the password gets lost. in addition, the size and compactness of the document will influence the selection of a file format. fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. adherence to open source standards is usually a good indication of the interoperability of a format. in general, an open standard is released after years of bargaining and agreements between major players. supervision by an international standard (such as iso or the w3c) commonly helps propagate the format. in addition to the five categories mentioned above, other attributes are often used. presentation, authenticity, adoption, protection, preservation and reference are such examples. among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. it refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., png files include byte sequences to validate against errors). another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. one example is the difference between the creation date, modification date, and access date of any file on a personal computer. these three dates correspond to a moment when someone (often a different person each time) opened the file. other mechanisms may require log information, which is external to the file. another good indication of authenticity is the stability of a format.35 a format that is widely used is more likely to be stable. a stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, selfcontained, and beyond normal rendering. adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. protection includes the technical protection mechanism or source verification to protect with security skills. preservation means long-term preservation, institutional support, or ease of transformation and preservation. reference indicates citability, or referential extensibility. among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. for instance, rog and van wijk use openness for standardization and specification without restrictions,36 while examining attributes of open standard file formats for long-term preservation and open access | park and oh 50 several other researchers use open availability to convey the same thing.37 they in turn adopt the term disclosure to express that specification is publicly available.38 discussion and conclusion functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. when file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. it is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. additionally, users find it necessary to have access to digital information in these file formats. additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). when determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. it is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. the outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. we hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. we are reminded of todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 the question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. the criteria may change over time, as is necessary for any format to adequately serve its purpose. maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. note: this research was supported by the sungkyunkwan university research fund (2010-2011). information technology and libraries | december 2012 51 references and notes 1. library of congress, “sustainability of digital formats: planning for library of congress collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed november 21, 2011). 2. global digital format registry, www.gdfr.info (accessed november 17, 2011); the technical registry pronom, www.nationalarchives.gov.uk/aboutapps/pronom (accessed november 21, 2011). 3. mike folk and bruce r. barkstrom, “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries” (paper presented at the joint conference on digital libraries (jcdl), houston, tx, may 27–31, 2003), 1, www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed november 21, 2011). 4. interpares 2 project glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed november 21, 2011). 5. premis editorial committee, premis data dictionary for preservation metadata, ver. 2.0, march 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed november 21, 2011). 6. ian barnes, “preservation of word processing documents,” july 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). 7. ibid. 8. gail hodge and nikkia anderson, “formats for digital preservation: a review of alternatives and issues,” information services & use 27 (2007): 46. 9. folk and barkstrom, “attributes of file formats.” 10. barnes, “preservation of word processing documents.” 11. carl rauch, harald krottmaier, and klaus tochtermann, “file-formats for preservation: evaluating the long-term stability of file-formats,” in proceedings of the 11th international conference on electronic publishing 2007 (vienna, austria, june 13–15, 2007): 101–6. 12. susan j. sullivan, “an archival/records management perspective on pdf/a,” records management journal 16, no. 1 (2006): 51–56. 13. judith rog and caroline van wijk, “evaluating file formats for long-term preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 52 14. d. k. sahu, “long term preservation: which file format to use” (paper presented in workshops on open access & institutional repository, chennai, india, may 2–8, 2004), http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). 15. cendi digital preservation task group, “formats for digital preservation: a review of alternatives and issues,” www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). 16. hodge and anderson, “formats for digital preservation.” 17. david 4 project (digital archiving, guideline and advice 4), “standards for fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed november 21, 2011). 18. sullivan, “an archival/records management perspective on pdf/a”; john michael potter, “formats conversion technologies set to benefit institutional repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed november 21, 2011). 19. eva müller et al., “using xml for long-term preservation: experiences from the diva project,” in proceedings of the 6th international symposium on electronic theses and dissertations (may 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/html/index.html (accessed november 21, 2011). 20. rene van horik, “image formats: practical experiences” (paper presented in erpanet training, vienna, austria, may 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf (accessed november 21, 2011). 21. open standard is related to open access, which comes from the open access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. see amy e. c. koehler, “some thoughts on the meaning of open access for university library technical services,” serials review 32, no. 1 (march 2006): 17–21; budapest open access initiative, “read the budapest open access initiative,” www.soros.org/openaccess/read.shtml (accessed november 21, 2011). 22. national archives, “selecting file formats for long-term preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 23. folk and barkstrom, “attributes of file formats.” http://openmed.nic.in/1363/01/long_term_preservation.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf information technology and libraries | december 2012 53 24. andreas stanescu, “assessing the durability of formats in a digital preservation environment: the inform methodology,” d-lib magazine 10, no. 11 (november 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed november 21, 2011). 25. malcolm todd, “technology watch report: file formats for preservation,” www.dpconline.org/advice/technology-watch-reports (accessed november 21, 2011). 26. ibid. 27. hodge and anderson, “formats for digital preservation.” 28. edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science & technology librarianship (spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed november 21, 2011); carl vilbrandt et al., “cultural heritage preservation using constructive shape modeling,” computer graphics forum 23, no. 1 (2004): 25–41; marshall breeding, “preserving digital information,” information today 19, no. 5 (2002): 48–49. 29. eun g. park, “xml: examining the criteria to be open standard file format,” (paper presented at the interpares 3 international symposium, oslo, norway, september 17, 2010), www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf (accessed november 21, 2011). 30. adrian brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 31. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011); sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 32. the national archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_me thod_27022008.pdf (accessed november 21, 2011); ecma international, “office open xml file formats—ecma-376,” www.ecma-international.org/publications/standards/ecma-376.htm (accessed november 21, 2011). 33. christoph becker et al., “systematic characterisation of objects in digital preservation: the extensible characterisation languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed november 21, 2011); national archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 34. folk and barkstrom, “attributes of file formats.” 35. national archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011); rog and van wijk, “evaluating file formats for long-term preservation.” 36. rog and van wijk, “evaluating file formats for long-term preservation.” 37. see brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 38. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011).; sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 39. todd, “technology watch report,” 33. 40. evelyn peters mclellan, “selecting digital file formats for long-term preservation: interpares 2 project general study 11 final report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed november 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf information technology and libraries | december 2012 55 appendix: file format attributes no. attribute definition/description assessed file format 1. f u n c t i o n a l i t y robustness robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) a robust format contains several layers of defense against corruption (frey, 2000). n/a feature set formats supporting the full range of features and functionality (brown, 2003) n/a not defined (sahu, 2006) n/a viability error-detection facilities to allow detection of file corruption (brown, 2003). png format (yes) not defined (sahu, 2006) n/a support for graphic effects and typography not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (no) color maintenance not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (limited) clarity support for high image resolution (cendi, 2007; hodge & anderson, 2007) tiff_g4 (yes) quality this pertains to how well the format fulfills its task today: (1) low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (clausen, 2004). n/a compactness to minimize storage and i/o costs (folk & barkstrom, 2003) n/a simplicity ease of implementing readers (folk & barkstrom, 2003) n/a file corruption detection to be able to detect that a file has been corrupted; to provide errorcorrection (folk & barkstrom, 2003) n/a raw i/o efficiency formats that are organized for fast sequential access (folk & barkstrom, 2003) n/a availability of readers to maintain ease of data access for readers (folk & barkstrom, 2003) n/a ease of subsetting to process only part of data files (folk & barkstrom, 2003) n/a size to transfer data in large blocks (folk & barkstrom, 2003) n/a ability to aggregate many objects in a single file to maintain as small as archive “name space” as possible (folk & barkstrom, 2003) n/a ability to embed data extraction software in the files the files come with read software embedded (folk & barkstrom, 2003). n/a ability to name file elements to work with data based on manipulating the element names instead of binary offsets, or other references (folk & barkstrom, 2003) n/a rigorous definition to be defined in a sufficient rigorous way (folk & barkstrom, 2003) n/a multilanguage implementation of library software to have multiple implementations of readers for a single format (folk & barkstrom, 2003) n/a memory some formats emphasize the presence or absence of memory (frey, 2000). tiff (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 56 accuracy in some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. in the case of a digital master, however, accuracy is very important (frey, 2000). n/a speed the ability to access or display a data set at a certain speed is critical to certain applications (frey, 2000). n/a extendibility a data format can be modified to allow for new types of data and features in the future (frey, 2000). n/a modularity a modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (frey, 2000). n/a plugability related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (frey, 2000). n/a interpretability not binary formats (barnes, 2006) rtf (yes) ms word (no) xml (yes) the standard should be written in characters that people can read (lesk, 1995). n/a complexity human readability, compression, variety of features (rog & van wijk, 2008; wijk & rog, 2007). n/a simple raster formats are preferred (puglia et al., 2004). n/a compression algorithms the format uses standard algorithms (puglia et al., 2004). n/a accessibility to prohibit encryption in the file trailer (sullivan, 2006) pdf/a (yes) component reuse not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) repurposing not defined (sahu, 1999) pdf (limited) html (limited) sgml (excellent) xml (excellent) packaging formats in general, packaging formats should be acceptable as transfer mechanisms for image file formats (puglia et al., 2004). zip (yes) significant properties the format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (puglia et al., 2004). n/a processability the requirement to maintain a processable version of the record to have any reuse value (brown, 2003) conversion of a word-processed document into pdf format. (no) searching not defined (sahu, 2006) pdf (limited) html (good) sgml (excellent) xml (excellent) no definite term to support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract xml language (becker et al., 2008a; becker et al., 2008b) n/a xcl (yes) to make transferring data easy (johnson, 1999) n/a xml (yes) a format that is easy to restore and understand by both humans and machines (müller et al., 2003) n/a xml (yes) information technology and libraries | december 2012 57 inability to be backed out into a usable format (potter, 2006) pdfs (no) 2. m e t a d a t a self-documentation self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) metadata and technical description of format embedded (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) the ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (arms & fleischhauer, 2006) n/a self-documenting to contain its own description (abrams et al., 2005) n/a documentation deep technical documentation publicly and fully is available. it is maintained for older versions of the format (puglia et al., 2004). n/a metadata support file formats making provision for the inclusion of metadata (brown, 2003) tiff (yes) microsoft word 2000 (yes) not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) not defined (sahu, 2006) n/a metadata the format allows for self-documentation (puglia et al., 2004). n/a content-level description not presentation-level description; structural markup, not formatting (barnes, 2006) pdf (no) docbook (yes) tei (yes) xhtml (yes) xml (yes) content-level, not presentation-level, descriptions where possible, the labeling of items should reflect their meaning, not their appearance (lesk, 1995). sgml (yes) self-describing many different types of metadata are required to decipher the contents of a file (folk & barkstrom, 2003). n/a self-describing files embed metadata in pdf files (sullivan, 2006) pdf/a (adobe extensible metadata platform required) formal (bnfor xml-like) description of format to create new readers solely on the basis of formal descriptions of the file content (folk & barkstrom, 2003) n/a no definite term its self-describing tags identify what your content is all about (johnson, 1999). n/a xml (yes) a format for strong descriptive and administrative metadata and the complete content of the document (müller et al., 2003) n/a xml (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 58 3. o p e n n e s s disclosure authoritative specification publicly available (abrams et al., 2005) pdf/a (yes) microsoft word (no) the degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (cendi, 2007; hodge & anderson, 2007; arms & fleischhauer, 2006) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) authoritative specification is publicly available (sullivan, 2006). pdf/a (yes) open availability no proprietary formats (barnes, 2006) odf (yes) gif (no) pdf (no) rtf (no) microsoft word (no) any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (lesk, 1995). kodak photocd (no) gif (no) openness standardization, restrictions on the interpretation of the file format, reader with freely available source (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) ms word (no) a standard is designed to be implemented by multiple providers and guide 5: file formats for digital masters employed by a large number of users (frey, 2000). n/a formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with gpl’ed source, (5) not encrypted (clausen, 2004). n/a open-source software or equivalent to move toward obtaining open-source arrangements for all parts of the file format and associated libraries (folk & barkstrom, 2003) n/a open standard formats for which the technical specification has been made available in the public domain (brown, 2003) jpeg (yes) pdf (limited) ascii (limited) not defined (sahu, 2006) n/a standard/ proprietary not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) nonproprietary formats the specification is independent of a particular vendor (public records office of victoria, 2004). n/a no definite term to avoid vendor-lock (potter, 2006) odf (yes) information technology and libraries | december 2012 59 4. i n t e r o p e r a b i l i t y interoperability is the format supported by many software applications/os platforms or is it linked closely with a specific application (puglia et al., 2004)? n/a the ability to exchange electronic records with other users and it systems (brown, 2003) n/a not defined (sahu, 2006) n/a data interchange not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) compatibility compatibility with prior versions of data set definitions often is needed for access and migration considerations (frey, 2000). n/a stability compatibility between versions (folk & barkstrom, 2003) n/a stable, not subject to constant or major changes over time (brown, 2003) n/a the format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (puglia et al., 2004). n/a not defined (sahu, 2006). n/a scalability the design should be applicable both to small and large data sets and to small and large hardware systems (frey, 2000). n/a markup compatibility and extensibility to support a much broader range of applications (ecma, 2008) n/a xml (yes) suitability for a variety of storage technologies the format should not be geared toward any particular technology (folk & barkstrom, 2003). n/a no definite term to allow data to be shared across information systems and remain impervious to many proprietary software revisions (potter, 2006) openoffice (yes) 5. i n d e p e n d e n c e device independencies can be reliably and consistently rendered without regard to the hardware/software platform (abrams et al., 2005) pdf/a (yes) tiff (no) static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (sullivan, 2006). pdf/a (yes) pdf/x (yes) this is a very important aspect for master files because they will be most likely used on various systems (frey, 2000). n/a independent implementations independent implementations help ensure that vendors accurately implement the specification (public records office of victoria, 2004). n/a externaldependency degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (arms & fleischhauer, 2006) n/a external dependencies the degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (no) tiff_g4 (no) xml (no) examining attributes of open standard file formats for long-term preservation and open access | park and oh 60 portability a format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. a format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (clausen, 2004). n/a monitoring obsolescence information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (clausen, 2004). n/a no definite term a human-readable text format and internationalized character sets are supported (müller et al., 2003). n/a xml (yes) not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (little) the format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (puglia et al., 2004). n/a 6. p r e s e n t a t i o n distributing page image not defined (sahu, 2006) pdf (excellent) html (good) sgml (good) xml (good) normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (limited) tiff_g4 (yes) xml (yes) presentation preservation of its original look and feel (brown, 2003) n/a self-containment everything that is necessary to render or print a pdf/a file must be contained within the file (sullivan, 2006). pdf/a (yes) self-contained to contain all resources necessary for rendering (abrams et al., 2005) n/a beyond normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (limited) 7. a u t h e n t i c i t y authenticity the format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (brown, 2003). n/a provenance traceability ability to trace the entire configuration of data production (folk & barkstrom, 2003) n/a integrity of layout not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (yes) integrity of rendering of equations not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (limited) integrity of structure not defined (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (limited) tiff_g4 (n/a) information technology and libraries | december 2012 61 xml (yes) 8. a d o p t i o n adoption degree to which the format is already used by the primary creators, disseminators, or users of information resources (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) worldwide usage, usage in the cultural heritage sector as archival format (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) microsoft word (limited) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a widespread use may be the best deterrent against preservation risk (abrams et al., 2005). tiff (yes) the format is widely used by the imaging community in cultural institutions (puglia et al., 2004). n/a flexibility of implementation to promote its wide adoption (sullivan, 2006) pdf/a (yes) popularity a format that is widely used (folk & barkstrom, 2003) n/a widely used formats it is far more likely that software will continue to be available to render the format (public records office of victoria, 2004). n/a ubiquity popular formats supported by as much software as possible (brown, 2003) n/a not defined (sahu, 2006) n/a continuity the file format is mature (puglia et al., 2004) n/a 9. p r o t e c t i o n technical protection mechanism password protection, copy protection, digital signature, printing protection and content extraction protection (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (no) tiff_g4 (no) xml (no) it must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (arms & fleischhauer, 2006). n/a no encryption, passwords, etc. (abrams et al. (2005) n/a protection the format accommodates error detection, correction mechanisms, and encryption options (puglia et al., 2004). n/a source verification cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (folk & barkstrom, 2003) n/a examining attributes of open standard file formats for long-term preservation and open access | park and oh 62 10. p r e s e r v a t i o n preservation the format contains embedded objects (e.g., fonts, raster images) or links to external objects (puglia et al., 2004). n/a long-term institutional support to ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (folk & barkstrom, 2003) n/a ease of transformation/ preservation the format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (puglia et al., 2004). n/a no definite term to create files with either a very high or very low preservation value (becker et al., 2008a, becker et al., 2008b) pdf (no) tiff (no) 11. r e f e r e n c e citability a machine-independent ability to reference or “cite” the individual data element in a stable way (folk & barkstrom, 2003) n/a referential extensibility ability to build annotations about new interpretations of the data (folk & barkstrom, 2003) n/a no definite term an open and established notation (müller et al., 2003) n/a xml (yes) data is easily repurposed via tags or translated to any medium (johnson, 1999) n/a xml (yes) creating, using, and reusing tags is easy, making it highly extensible (johnson, 1999). n/a xml (yes) 12. o t h e r s transparency degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (cendi, 2007, hodge & anderson, 2007). pdf (limited) pdf/a (limited) tiff_g4 (limited) xml (yes) in natural reading order (sullivan, 2006). pdf/a (yes) microsoft notepad (yes) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a amenable to direct analysis with basic tools (abrams et al., 2005) n/a ample comment space to allow rich metadata (barnes, 2006) n/a items should be labeled, as far as possible, with enough information to serve for searching or cataloging (lesk, 1995). tiff (yes) a digital format may inhibit the ability of archival institutions to sustain content in that format (arms & fleischhauer, 2006). n/a information technology and libraries | december 2012 63 table bibliography abrams, stephen et al. 2005. “pdf-a: the development of a digital preservation standard.” paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, http://www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011). arms, caroline r. and carl fleischhauer. 2006. “sustainability of digital formats: planning for library of congress collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed november 21, 2011). barnes, ian. 2006. “preservation of word processing documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). becker, christoph et al. 2008. “a generic xml language for characterising objects to support digital preservation.” in proceedings of the 2008 acm symposium on applied computing, fortaleza, ceara, brazil, march 16–20. becker, christoph et al. 2008. “systematic characterization of objects in digital preservation: the extensible characterization language.” journal of universal computer science 14, no 18: 2936– 2952. brown, adams. 2003. “the national archives. digital preservation guidance note: selecting file formats for long-term preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011). cendi digital preservation task group. 2007. “formats for digital preservation: a review of alternatives and issues.” http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). clausen, lars r. 2004. “handling file formats.” http://netarchive.dk/publikationer/fileformats2004.pdf (accessed november 21, 2011). ecma. 2008. “office open xml file formats—part 1.” 2nd ed. http://www.ecmainternational.org/publications/standards/ecma-376.htm (accessed november 21, 2011). folk, mike, and bruce barkstrom. 2003. “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries.” paper presented at the joint conference on digital libraries, houston, tx, may 27–31. http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 64 frey, franziska. 2000. “5. file formats for digital masters.” in guides to quality in visual resource imaging, research libraries group and digital library federation. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf (accessed november 21, 2011). hodge, gail and nikkia anderson. 2007. “formats for digital preservation: a review of alternatives and issues.” information services & use 27: 45–63. johnson, amy helen. 1999. “xml xtends its reach: xml finds favor in many it shops, but it’s still not right for everyone.” computerworld 33, no. 42: 76–81. lesk, michael e. 1995. “preserving digital objects: recurrent needs and challenges.” in proceedings of the 2nd npo conference on multimedia preservation. brisbane, australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed november 21, 2011). müller, eva et al. 2003. “using xml for long-term preservation: experiences from the diva project.” in proceedings of the sixth international symposium on electronic theses and dissertations. berlin, may: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/pdf/index.pdf (accessed december 8, 2012). potter, john michael. 2006. “formats conversion technologies set to benefit institutional repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed november 21, 2011). public records office of victoria (australia). 2006. “advice on vers long-term preservation formats pros 99/007 (version2) specification 4.” department for victorian communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf (accessed november 21, 2011). puglia, steven, jeffrey reed, and erin rhodes. 2004. “technical guidelines for digitizing archival materials for electronic access: creation of production master files—raster images.” us national archives and records administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed november 21, 2011). rog, judith, and caroline van wijk. 2008. “evaluating file formats for long-term preservation.” national library of the netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_metho d_27022008.pdf (accessed november 21, 2011). sahu, d.k. 2004. “long term preservation: which file format to use.” presentation at workshops on open access & institutional repository, chennai, india, may 2–8, http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). sullivan, susan j. 2006. “an archival/records management perspective on pdf/a.” records management journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/long_term_preservation.pdf information technology and libraries | december 2012 65 van wijk, caroline, and judith rog. 2007. “evaluating file formats for long-term preservation.” presentation at international conference on digital preservation, beijing, china, oct 11–12. http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf (accessed november 21, 2011). http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf algorithmic literacy and the role for libraries article algorithmic literacy and the role for libraries michael ridley and danica pawlick-potts information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12963 abstract artificial intelligence (ai) is powerful, complex, ubiquitous, often opaque, sometimes invisible, and increasingly consequential in our everyday lives. navigating the effects of ai as well as utilizing it in a responsible way requires a level of awareness, understanding, and skill that is not provided by current digital literacy or information literacy regimes. algorithmic literacy addresses these gaps. in arguing for a role for libraries in algorithmic literacy, the authors provide a working definition, a pressing need, a pedagogical strategy, and two specific contributions that are unique to libraries. introduction algorithms, in one form or another, are as old as human problem solving and as simple as “a sequence of computational steps that transform the input into the output.”1 for centuries they have been effective, and uncontroversial, methodologies. however, the rise of artificial intelligence (the integration of big data, enhanced computation, and advanced algorithms) with its human and greater-than-human performance in many areas has positioned algorithms as transformational and a “major human rights issue in the twenty-first century.”2 algorithmic literacy is important given of the prevalence of algorithmic decision-making in many aspects of everyday life and because “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”3 as a result, david lankes warns of a new type of digital divide with “a class of people who can use algorithms and a class used by algorithms.”4 in a 2019 deloitte survey “only 4 percent reported they were confident explaining what ai is and how it works.”5 while a 2019 edelman survey indicated general awareness of ai, it also revealed a similar lack of knowledge about the details of ai.6 an informed, algorithmically literate public is better able to negotiate and employ the complexities of ai.7 identifying and acting upon algorithms as a literacy makes them as “fundamental as reading, writing, and arithmetic.”8 however, the uncritical use of the term literacy should make one suspicious of extending it to algorithms. increasingly “literacy” has come to mean merely a body of knowledge or a set of domain-specific skills.9 various literacies have been described, such as health, death, financial, physical, ocean, religious, visual, dancing, spatial, screen, and porn. this includes a dozen different technology-related literacies.10 the case for algorithmic literacy, and the role for libraries in advancing it, must rest on a clear definition, a recognized problem and need, a pedagogical strategy, and a unique (or at least supportive) contribution libraries can provide. michael ridley (mridley@uoguelph.ca) is librarian emeritus, mclaughlin library, university of guelph, ontario, canada. danica pawlick-potts (dpawlic@uwo.ca) is phd candidate, faculty of information and media studies, western university, ontario, canada. © 2021. mailto:mridley@uoguelph.ca mailto:dpawlic@uwo.ca information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 2 algorithms and literacy while the term “algorithmic literacy” is recent, it has antecedents that cover similar if not equivalent ground. the general terms computer literacy or digital literacy have spawned more specific terms such as cyber literacy, computational thinking, and algorithmic thinking.11 most of these arise from the field of computer science, where algorithms are central, and focus on the computational nature of algorithms as a “matter of mathematical proof” where “other knowledge about algorithms—such as their applications, effects, and circulation—is strictly out of frame.”12 the implications of algorithms in everyday life suggests that a deeper and broader interpretation is required. whether a literacy, a mode of thinking, or merely a set of skills, discussions about computation and algorithms have been plagued by “ambiguity and vagueness” and “definitional confusion” resulting in ongoing challenges in establishing core pedagogy in both k–12 and higher education.13 without a clear, acknowledged, and actionable definition that differentiates it from concepts such as digital literacy, computational thinking, and algorithmic thinking, algorithmic literacy will be relegated to a buzz phrase and the urgency of its recognition and application will be lost. the relationship between algorithms and artificial intelligence might recommend the adoption of “ai thinking” or “ai literacy” as the more appropriate term.14 however, algorithmic literacy is both more foundational than the broader concept of ai and more actionable than just thinking. algorithms are not a technology like ai or, more generally, computers. algorithms provide a structure that frames—and constrains—how we express ourselves. they are a way of seeing and acting in the world and “need to be understood as relational, contingent, [and] contextual.”15 while the technical and operational aspects of algorithms are important to understand and use (as they are for the technologies and processes of reading and writing in a new language), they are complemented by a broader awareness: literacy is not a set of generic skills or something we do or do not possess, it’s a sociocultural practice, it’s something that we do, and what we do with literacy depends on the social, cultural, and historical contexts in which we do it. literacy looks different in different contexts and communities. literacy is not neutral, it’s ideological. there are dominant and marginalized literacies.16 this perspective is the essence of critical algorithm studies, where algorithms are viewed as sociotechnical systems that are “intrinsically cultural . . . constituted not only by rational procedures, but by institutions, people, intersecting contexts, and the rough-and-ready sensemaking that obtains in ordinary cultural life.”17 algorithms as part of increasingly ubiquitous ai, such as machine learning and deep learning systems, reflect and promulgate certain ideologies and have impacts and influences in the full range of human society. cautions about algorithmic decision-making have identified the far-reaching implications for bias, fairness, privacy, and democratic processes.18 at the same time, numerous national strategies to support ai development have highlighted the substantial economic impact, anticipated to be $15.7 trillion (us) by 2030.19 the idea of algorithmic literacy must encompass multiple perspectives and contexts. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 3 “literacies of the digital” computer, internet, information, computation, and algorithmic are all “literacies of the digital.” 20 while each of these has its own domain and focus, they share common ideas and are generally symbiotic with each other. there is an especially strong and complementary connection between computational literacy and information literacy.21 computational thinking and algorithmic literacy are closely related even if most definitions of the former fail to fully acknowledge the broader social, economic, and political implications. however, the extensive literature on computational thinking is useful in helping to articulate aspects of algorithmic literacy. wing’s foundational article about computational thinking describes the key characteristics in terms that closely resemble a literacy: 1. conceptualizing, not programming 2. fundamental, not a rote skill 3. a way that humans, not computers, think 4. complements and combines mathematical and engineering thinking 5. ideas, not artifacts 6. for everyone, everywhere.22 jacob and warschauer make a strong case for computational thinking as a literacy. their threepart framework identifies computational thinking as a new literacy embedded in modern sociocultural practices (computational thinking as literacy), discusses how literacy development can be leveraged to foster computational thinking (computational thinking through literacy), and explores ways in which computational thinking can facilitate literacy development (literacy through computational thinking).23 this analysis of computational thinking informs the larger context and broader implications of algorithmic literacy. defining algorithmic literacy scribner and cole define a literacy as “socially organized practices [that] make use of a symbol system and a technology for producing and disseminating it.”24 therefore, literacy = practices + symbol system + technology. to this definition, steiner adds a more aspirational and humanistic definition: by “literacy” i mean the ability to engage with, to respond to, what is most challenging and creative in our societies. to experience and contribute to the energies of informed debate. to distinguish the “news that stays news,” as ezra pound put it, from the tidal waves of ephemeral rubbish, superstition, irrationalism, and commercial exploitation.25 literacy is about knowing and meaning making through the processes of internalizing and externalizing information. literacy enables a reflective, critical, and integrative approach to information that utilizes a broad knowledge base for both understanding and communicatin g ideas. finn calls for an algorithmic literacy “that builds from a basic understanding of computational systems, their potential and their limitations, to offer us intellectual tools for interpreting the algorithms shaping and producing knowledge” and thereby provides “a way to information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 4 contend with both the inherent complexity of computation and the ambiguity that ensues when that complexity intersects with human culture.”26 referring more broadly to “ai literacy,” long and magerko provide an operational view defining it as “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.” 27 following an exhaustive analysis of different, and often contradictory, definitions of literacy, information literacy, and digital literacy, bawden suggests “explaining, rather than defining, terms.”28 this provisional description of algorithmic literacy acknowledges that advice. algorithmic literacy is the skill, expertise, and awareness to • understand and reason about algorithms and their processes • recognize and interpret their use in systems (whether embedded or overt) • create and apply algorithmic techniques and tools to problems in a variety of do mains • assess the influence and effect of algorithms in social, cultural, economic, and political contexts • position the individual as a co-constituent in algorithmic decision-making. this description recognizes two overarching concepts: “creativity and critical analysis.”29 creativity involves building, creating, and using algorithms for specific purposes. critical analysis involves recognizing the application of algorithms in decision-making and the implications of their use in a variety of settings and within certain contexts. why algorithmic literacy? the need for algorithmic literacy arises from two key and equally important perspectives, both of which essentially focus on power: control and empowerment. algorithms, especially those us ing machine learning and deep learning, are complex, opaque, invisible, shielded by intellectual property protection, and most importantly, consequential in the everyday lives of people.30 control is held by those who build and deploy algorithms, not those who use them. in part because of these characteristics, people hold significant misconceptions about algorithms, their use, and their effect. in a 2019 global survey of consumers, 72% said they understood what ai was. however, despite ai being used in a wide variety of consumer-facing applications (e.g., email, search, social media), 64% said they had never used ai.31 a study of facebook users found that 62% were unaware that the news feed is algorithmically constructed and, even when told this, 12% concluded that it is, as a result, completely random.32 bias, discrimination, and unfairness in ai have been well documented.33 it is clear that poor data combined with underspecified algorithms and uncritical interpretations of the ai model outcomes can lead to abuses in a variety of ways. there is no quick fix, no automated solution to these problems. accordingly, those creating algorithms and those using them must be able to question the source of training data, the strengths and weaknesses of learning algorithms, the metrics for success, and how (and for whom) the systems are being optimized. the overarching objectives are accountability and transparency. perhaps most critically, the prevalence of algorithms in our lives has changed the way we interact with and use those systems, and the ways we behave in personal and social contexts. we conduct information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 5 ourselves to be “algorithmically recognizable” allowing us to become “increasingly legible to machines for capture and calculation.”34 the danger is that this will “lead users to internalize their [algorithm’s] norms and priorities.”35 at the same time the power of algorithmic technology is abused and misused, it remains a powerful technology to enhance human capabilities and insight. algorithms are attributable to dramatic advances in health care and science as well as more mundane (but appreciated) applications such as spam filters. anti-science sentiments, typified by anti-vaxxers, should not be allowed to undermine the opportunities for algorithms that materially improve the human condition and the natural world. those opportunities now extend beyond the well-funded, technology-rich research and corporate ai departments. increasingly more consumer-friendly tools and applications allow a broader and more diverse population to create algorithmic solutions. the rise of mlaas (machine learning as a service) brings together powerful cloud-based machine learning environments with accessible toolsets.36 algorithmic literacy is needed to acknowledge both the technology’s power (control) over people and power (empowerment) for people. recognizing the need for protection and encouragement, many governments have enacted protective legislation and training initiatives. emblematic of the former is the general data protection regulation (gdpr) of the european union with its “right to explanation” for algorithmic decisions.37 exemplary of the latter is finland’s initiative to educate a large portion of their population through “elements of ai,” a free online course.38 despite these advances there remain power imbalances that require vigilance on the part of 21 st century digital citizens. understanding the power and politics of algorithms recognizes their ontological impact in “new ways of ordering the world.”39 effects this profound suggest a deeper and more comprehensive understanding of algorithms is needed: efforts to help people understand algorithms need to continue moving away from a focus on building awareness of algorithms—people increasingly know about “those things called algorithms”—and toward explaining algorithms in such a way that people have a more consistent conceptualization of what algorithms are, what algorithms do, and—what often is overlooked—what algorithms cannot do.40 algorithmic literacy, like all literacies, is not about mastery but levels of competence appropriate to age, circumstance, and need. understood simply as recipes or visual decision trees, algorithms are accessible to even those with minimal digital literacy. public institutions, and specifically libraries, can and must take a lead role in addressing the challenges of this “new world.” the library role in algorithmic literacy libraries have traditionally played a central role in making emerging technologies accessible to their communities whether those be online systems, makerspaces, interactive media, virtual reality, or a host of others. advancing digital access, digital literacy, and digital inclusion have long been the acknowledged by governments and public agencies as a role of the public library even if not appropriately funded to do so.41 recently, libraries have begun addressing their role in relation to ai and algorithmic literacy. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 6 the urban libraries council (ulc) conducted an informal poll about ai and public libraries.42 of the responding libraries (83 of its 150-member library systems), 45% identified ai as important to their leadership with 23% having a staff person dedicated to ai and 27% providing programming to help the public learn about ai. in response to a question of how best libraries could serve their community in this area, 79% said by framing and building awareness of ai, 68% recommended providing continuous education opportunities for the public, and 61% supported the provision of experiential programming. in 2019, the ulc formed a working group to advance the public library role in ai awareness, education, and experiences. in 2018 the canadian federation of library associations (cfla) held a national forum in part focused on artificial intelligence.43 participant discussions yielded three key priorities with respect to ai: training for library staff, educational materials of for the public, and advocacy initiatives regarding privacy, bias, and transparency. a fourth priority was the inclusion o f ai literacy and awareness in mis and mlis curricula to facilitate a leadership role for the profession in this area. algorithmic literacy programs have two general audiences: members of the community the library serves and the staff of the libraries themselves. for the community, these programs center on awareness and implications, skill development, and application and use.44 through workshops, hands-on laboratories and makerspaces, consumer checklists, and a variety of informational tools, libraries can provide, or partner in providing, resources in an ageand context-appropriate setting. for library staff, an additional focus is required on advocacy with respect to regulatory issues, system development, and the evolution of the local and national information infrastructures. library staff can lead, and participate in, advocacy programs that seek to influence government, public agencies, commercial system and service providers, and others about algorithmic literacy. it is a misconception to think of algorithms, and ai more generally, as arcane topics beyond the ability of library staff to understand and teach. while the technical details of ai are complex, this is not the level of understanding required of staff or needed by the library’s community. for example, ai programing at the frisco public library introduced ai maker kits and ran basic ai classes. the toronto public library, through its digital innovation hubs, has offered learning circles in basic ai (using the finnish elements of ai course as a foundation) and hosted presentations on various aspects of algorithms in everyday life. by abstracting algorithms to higher level concepts related directly to daily experience (using facebook is illustrative of many key ideas regarding algorithmic literacy), staff can obtain a sufficient overview from a variety of accessible, introductory texts or videos. perhaps most importantly, given the new and evolving nature of this technology, library staff should view themselves as co-learners. no matter the setting or context, an active learning approach is recommended with learners situated as makers as well as consumers.45 a review of the k–12 curricula regarding computational literacy identified active learning strategies based on projects, problem solving, cooperation, and games. the researchers recommend augmenting these with scaffolding strategies, storytelling, and aesthetic approaches.46 while intended for algorithmic literacy initiatives involving children, four design principles from dasgupta and mako are relevant for any demographic: https://friscolibrary.com/ https://www.torontopubliclibrary.ca/ https://www.elementsofai.com/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 7 1. make data analysis central and ensure the data is relevant to the learner, 2. manage risk by using sandboxes for experimentation, 3. respect community values about technology that may differ, and 4. support authenticity with real-world examples and scenarios.47 long and magerko document a set of 17 core competencies and 15 associated learning design considerations regarding ai literacy.48 taken together these represent the basis for an algorithmic literacy program for any demographic and any context. libraries are encouraged to seek partnerships and collaborations with schools (k–12 and higher education) as well as with non-profit advocacy and training groups.49 examples among these include the algorithmic literacy project (algorithmliteracy.org) and a.i. for anyone (https://aiforanyone.org). many technology companies also offer high quality programs and resources. however, a report from the public policy forum notes that digital literacy campaigns are “too often funded by the very companies that are contributing to the problem.”50 a key issue is the lack of assessment instruments. there are none for algorithmic literacy and few for computational thinking. the most prominent of the latter is skills based, focusing on concepts and operational practices and very little on the wider social and cultural implications.51 library experience with information literacy assessment can inform algorithmic literacy assessment by helping to balance skills and operational concerns with a wider focus on concepts and contextual awareness. information literacy and explainable ai (xai): unique library contributions while libraries can make contributions to algorithmic literacy through a variety of programs, resources, and advocacy initiatives, two specific areas suggest opportunities for unique contributions: algorithmic literacy as a part of information literacy and algorithmic literacy in support of “explainable ai” (xai). algorithmic literacy and information literacy annemaree lloyd describes the opacity and ubiquity of algorithms as “a wicked problem for librarians and archivists who have a vested interest in equitable access, informed citizenry and the maintenance of public memory” and insists that information literacy “provides resistance to the expansionist claims of algorithms, while at the same time ensuring that people harness the power of this culture to their advantage.”52 information literacy programs championed by libraries have been instrumental in raising awareness and skill building among their user communities. using information literacy programs as a scaffold, algorithmic literacy can be incorporated into these successful initiatives. however, given the current needs “machine learning and algorithms present frontstage in the information literacy constellation.”53 head et al., in their important 2020 study of algorithms and information literacy, present a view of student perspectives that is both troubling and optimistic. 54 the students expressed “a tangle of resignation and indignation” about the effects of algorithms on their lives. for them, algorithms obscure more than they reveal, privacy is compromised, “trust is dead,” and skepticism is total. the authors conclude that we face an “epistemological crisis” where algorithms are “stripping individuals of the responsibility to interpret the facticity of the information these systems give us when that interpretation has been performed by the algorithms themselves.” however, students also employed “defensive practices” against algorithms, utilized “multiple selves” to preserve their https://algorithmliteracy.org/ https://aiforanyone.org/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 8 privacy, and were keen to learn how to “fight back” against surveillance and algorithmic decisionmaking. this is a reminder that “while algorithms certainly do things to people, people also do things to algorithms.”55 people have “algorithmic capital” which they can use in “negotiation with algorithmic power.”56 with these findings, it seems clear that status quo information literacy programs will not address the unique challenges presented by algorithms. jason clark, scott young, and lisa janicke hinchliffe took up this challenge with a project funded by an imls grant.57 calling “algorithmic awareness” a “new competency,” these researchers identified a gap in the acrl framework for information literacy that revealed “a lack of an understanding around the rules that govern our software and shape our digital experiences.”58 those rules are the “invisible logic” of algorithms that need to be made transparent for users and library staff. deliverables from this project include an integrated curriculum, syllabus, and software prototype that respond uniquely to the pedagogical challenges of algorithmic literacy.59 in promoting ml (machine learning) literacy, ryan cordell also calls for a specific pedagogical approach that would “emphasize the situated-ness of ml training data and experiments, including the biases or oversights that influence the outcomes of academic, economic, and governmental ml processes.”60 recommendations from this report provide guidelines for developing staff expertise, running pilot projects, and creating toolsets and checklists supportive of responsible machine learning. algorithmic literacy and explainable ai (xai) perhaps a less obvious way for libraries to contribute to algorithmic literacy is through explainable ai (xai).61 difficulties in interrogating algorithms to assess bias, discrimination, and unfairness (as well as other deficiencies such as veracity and generalizability) have led to widespread interest in xai. the purpose of xai is to “enable human users to understan d, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners” and to deploy ai systems that have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future.”62 there is complementarity between the objectives of xai and algorithmic literacy. both seek transparency, promote understanding, and facilitate accountability. both recognize the primacy of human agency in human-machine interaction. xai is accomplished through a variety of techniques, strategies, and processes. these can involve unambiguous proofs, technical and statistical interventions for verification and validation, and authorizations that rely on standards, audits, and policy directives.63 explanations are contextual. system designers, professionals, regulators, end users, and the general public need explanations specific to their objectives and tailored to their skills and knowledge. as algorithmic decision-making is increasingly embedded in the information tools, services, and resources provided by libraries and promoted to users, xai and algorithmic literacy can operate in close association. libraries can incorporate aspects of xai into algorithmic literacy programming and the principles of algorithmic literacy (and more generally information literacy) can inform how xai is sensitive and responsive to different explanatory needs. xai is still an emergent field but it has had, and will continue to have, a profound impact on the development of machine learning systems. the opportunity for library involvement is immediate: information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 9 librarians need to become well versed in these technologies, and participate in their development, not simply dismiss them or hamper them. we must not only demonstrate flaws where they exist but be ready to offer up solutions. solutions grounded in our values and in the communities we serve.64 a repeated message from lis researchers is that library-developed tools to interrogate ai systems are essential components in advancing algorithmic literacy. 65 these tools can address the complexity and opacity of machine learning systems and provide levels of explainability and transparency in contextually appropriate ways. one such tool, either as a stand-alone system or embedded in an existing discovery system, might provide a user with access to the nature, and potential bias, of the training data, the general efficacy of the learning algorithm(s) used, and the generalizability of the trained model to different contexts. this xai scorecard would integrate the objectives of xai, algorithmic literacy, and information literacy. by leveraging and developing library staff skills and by partnering with ai research and industry groups “libraries can become ideal sites for cultivating responsible and responsive ml.”66 padilla views this engagement as not just a technical initiative but a library-wide effort to promulgate “responsible operations” with ai, noting that library practices “that embed transparency and explainability increase the likelihood of organizational accountability.”67 conclusion algorithms are “the new power brokers in society” and “we are growing increasingly dependent on computational spectacles to see the world.”68 lash argues that this development has altered the rules by which society operates. constitutive rules (e.g., rules that define the boundaries of society) and regulative rules (e.g., the rules define how we operate in society) are now joined by “algorithmic, generative rules.” these rules are “compressed and hidden and we do not encounter them in the way that we encounter constitutive and regulative rules. yet this third type of generative rules is more and more pervasive in our social and cultural life of the post-hegemonic order.”69 algorithmic literacy is a means to understand this new set of rules and to encourage the skills and abilities so people can use algorithms and not be used by them. libraries have typically championed accessible technology and its effective use. the ubiquity of algorithmic decisionmaking and its profound impact on everyday lives makes the recognition and promotion of algorithmic literacy a critical new challenge and imperative for libraries of all types. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 10 endnotes 1 thomas h. cormen et al., introduction to algorithms, 3rd ed. (cambridge ma: mit press, 2009), 13. 2 yoav shohman et al., “ai index 2017 report” (stanford, ca: human-centered ai initiative, stanford university, 2017), http://cdn.aiindex.org/2017-report.pdf; safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018), 1. 3 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computer mediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 58. 4 lee rainie and janna anderson, “code-dependent: pros and cons of the algorithmic age” (pew research center, february 2017), http://www.pewinternet.org/wpcontent/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf. 5 “canada’s ai imperative: from predictions to prosperity” (toronto: deloitte, 2019), 16, https://www.canada175.ca/en/reports/aiimperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2. 6 “2019 edelman ai survey,” edelman, 2019, https://www.edelman.com/sites/g/files/aatuss191/files/201903/2019_edelman_ai_survey_whitepaper.pdf. 7 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512; rainie and anderson, “code-dependent.” 8 jeannette wing, “computational thinking, 10 years later,” communications of the acm 59, no. 7 (2016): 10, https://doi.org/10.1145/2933410. 9 loanne snavely and natasha cooper, “the information literacy debate,” journal of academic librarianship 23, no. 1 (1997): 9–14, https://doi.org/10.1016/s0099-1333(97)90066-5. 10 alfred thomas bauer and ebrahim mohseni ahooei, “rearticulating internet literacy,” cyberspace studies 2, no. 1 (2018): 29–53, https://doi.org/10.22059/jcss.2018.245833.1012. 11 evelyn stiller and cathie leblanc, “from computer literacy to cyber-literacy,” journal of computing sciences in colleges 21, no. 6 (2006): 4–13; peter j. denning and matti tedre, computational thinking (cambridge ma: mit press, 2019); z. katai, “the challenge of promoting algorithmic thinking of both sciencesand humanities-oriented learners,” journal of computer assisted learning 31, no. 4 (2015): 287–99, https://doi.org/10.1111/jcal.12070. 12 nick seaver, “what should an anthropology of algorithms do?” (american anthropological association, chicago, 2013), 1–2, http://nickseaver.net/papers/seaveraaa2013.pdf. 13 jesús moreno-león and marcos román-gonzález, “on computational thinking as a universal skill,” in ieee global engineering education conference (educon, santa cruz de tenerife, http://cdn.aiindex.org/2017-report.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/2933410 https://doi.org/10.1016/s0099-1333(97)90066-5 https://doi.org/10.22059/jcss.2018.245833.1012 https://doi.org/10.1111/jcal.12070 http://nickseaver.net/papers/seaveraaa2013.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 11 spain: ieee, 2018), 1684–89; shuchi grover and roy pea, “computational thinking in k–12: a review of the state of the field,” educational researcher 42, no. 1 (2013): 38–43, https://doi.org/10.3102/0013189x12463051; betual c. czerkawski and eugene w. lyman iii, “exploring issues about computational thinking in higher education,” techtrends 59, no. 2 (2015): 57–65. 14 daniel zeng, “from computational thinking to ai thinking,” ieee intelligent systems (november/december, 2013), 2–4; duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 1–16, https://doi.org/10.1145/3313831.3376727. 15 rob kitchin, “thinking critically about and researching algorithms,” information, communication & society 20, no. 1 (2017): 18, https://doi.org/10.1080/1369118x.2016.1154087. 16 karen nicholson, “information into action? reflections on (critical) practice” (workshop on instruction in library use (wilu), university of ottawa, 2018), 7–8, https://ir.lib.uwo.ca/fimspres/51/. 17 nick seaver, “algorithms as culture: some tactics for the ethnography of algorithm systems,” big data & society 4 (2017): 10, https://doi.org/10.1177/2053951717738104. 18 virginia eubanks, automating inequity: how high-tech tools profile, police, and punish the poor (new york: st. martin’s press, 2018); noble, algorithms of oppression; cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, ma: harvard university press, 2015). 19 time dutton, “building an ai world: report on national and regional ai strategies” (toronto: cifar, 2018), https://www.cifar.ca/docs/default-source/aisociety/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4; pricewaterhousecooper, “sizing the prize: what’s the real value of ai for your business and how can you capitalise?,” 2017, https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prizereport.pdf. 20 allan martin and jan grudziecki, “digeulit: concepts and tools for digital literacy development,” innovation in teaching and learning in information and computer sciences 5, no. 4 (2006): 249–67, https://doi.org/10.11120/ital.2006.05040249. 21 rosanne cordell, “information literacy and digital literacy: competing or complementary?,” communications in information literacy 7, no. 2 (2013): 177–83, https://doi.org/10.15760/comminfolit.2013.7.2.150; andreas dengel and ute heuer, “a curriculum of computational thinking as a central idea of information & media literacy,” in proceedings of the 13th workshop in primary and secondary computing education (wipsce’18) october 4-6, 2018, potsdam, germany (new york: acm, 2018), https://doi.org/10.1145/3265757.3265777; sarah gretter and aman yadav, “computational https://doi.org/10.3102/0013189x12463051 https://doi.org/10.1145/3313831.3376727 https://doi.org/10.1080/1369118x.2016.1154087 https://ir.lib.uwo.ca/fimspres/51/ https://doi.org/10.1177/2053951717738104 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://doi.org/10.11120/ital.2006.05040249 https://doi.org/10.15760/comminfolit.2013.7.2.150 https://doi.org/10.1145/3265757.3265777 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 12 thinking and media & information literacy: an integrated approach to teaching twenty-first century skills,” techtrends 60 (2016): 510–16, https://doi.org/10.1007/s11528-016-0098-4. 22 jeannette wing, “computational thinking,” communications of the acm 49, no. 3 (2006): 35. 23 sharin rawhiya jacob and mark warschauer, “computational thinking and literacy,” journal of computer science integration 1, no. 1 (2018): 3, https://doi.org/10.26716/jcsi.2018.01.1.1. 24 sylvia scribner and michael cole, the psychology of literacy, acls humanities e-book (series) (cambridge, ma: harvard university press, 1981), 99. 25 george steiner, “school terms: redefining literacy for the digital age,” lapham’s quarterly 1, no. 4 (2008): 198. 26 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 25; ed finn, what algorithms want: imagination in the age of computing (cambridge, ma: mit press, 2017), 2. 27 long and magerko, “what is ai literacy?,” 2. 28 david bawden, “information and digital literacies: a review of concepts,” journal of documentation 57, no. 2 (2001): 233. 29 gretter and yadav, “computational thinking,” 510. 30 pasquale, the black box society; o’neil, weapons of math destruction. 31 “what consumers really think about ai: a global study,” pega, 2019, https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf. 32 motahhare eslami et al., “first i ‘like’ it, then i hide it: folk theories of social feeds,” in proceedings of the 2016 chi conference on human factors in computing systems, chi ’16 (san jose, ca: association for computing machinery, 2016), 2371–82, https://doi.org/10.1145/2858036.2858494. 33 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing; eubanks, automating inequity; noble, algorithms of oppression; pasquale, the black box society; ruha benjamin, race after technology: abolitionist tools for the new jim code (polity press, 2019); o’neil, weapons of math destruction. 34 tarleton gillespie, “the relevance of algorithms,” in media technologies: essays on communication, materiality, and society, ed. tarleton gillespie, pablo j. boczkowski, and kirsten a. foot (cambridge, ma: mit press, 2014), 184; sun-ha hong, technologies of speculation: the limits of knowledge in a data-driven society (new york: new york university press, 2020), 2. 35 gillespie, “the relevance of algorithms,” 187. https://doi.org/10.1007/s11528-016-0098-4 https://doi.org/10.26716/jcsi.2018.01.1.1 https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf https://doi.org/10.1145/2858036.2858494 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 13 36 altexsoft, “comparing machine learning as a service: amazon, microsoft azure, google cloud ai, ibm watson,” data science (blog), september 27, 2019, https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-serviceamazon-microsoft-azure-google-cloud-ai-ibm-watson/. 37 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation,’” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741. 38 finland, “work in the age of artificial intelligence: four perspectives on economy, employment, skills and ethics” (helsinki: ministry of economic affairs and employment, 2018), http://urn.fi/urn:isbn:978-952-327-313-9. 39 taina bucher, if . . . then: algorithmic power and politics (new york: oxford university press, 2018), 20. 40 alison j. head, barbara fister, and margy macmillan, “information literacy in the age of algorithms: student experiences with news and information, and the need for change” (project information literacy, 2020), 41, https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf. 41 paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1, https://doi.org/10.1080/01616846.2012.654728. 42 “ulc snapshot: artificial intelligence,” urban libraries council weekly newsletter, july 18, 2018. 43 canadian federation of library associations, “artificial intelligence and intellectual freedom: key policy concerns for canadian libraries” (ottawa: cfla, 2018), http://cfla-fcab.ca/wpcontent/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf. 44 martin and grudziecki, “digeulit.” 45 b. alexander, s. adams becker, and m. cummins, “digital literacy: an nmc horizon project strategic brief” (austin, tx: the new media consortium, 2016), https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/. 46 ting-chia hsu, shao-chen chang, and yu-ting hung, “how to learn and how to teach computational thinking: suggestions based on a review of the literature,” computers & education 126 (2018): 296–310, https://doi.org/10.1016/j.compedu.2018.07.004. 47 sayamindu dasgupta and benjamin mako hill, “designing for critical algorithmic literacies,” arxiv:2008.01719 [cs], 2020, http://arxiv.org/abs/2008.01719. 48 long and magerko, “what is ai literacy?” 49 alexander, adams becker, and cummins, “digital literacy.” https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 https://doi.org/10.1609/aimag.v38i3.2741 http://urn.fi/urn:isbn:978-952-327-313-9 https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf https://doi.org/10.1080/01616846.2012.654728 http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/ https://doi.org/10.1016/j.compedu.2018.07.004 http://arxiv.org/abs/2008.01719 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 14 50 edward greenspon and taylor owen, “democracy divided: countering disinformation and hate in the digital public sphere” (ottawa: public policy forum, 2018), 19, https://ppforum.ca/wpcontent/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf. 51 marcos román-gonzález, juan-carlos pérez-gonzález, and carmen jiménez-fernández, “which cognitive abilities underlie computational thinking? criterion validity of the computational thinking test,” computers in human behavior 72 (2017): 678–91, https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047. 52 annemaree lloyd, “chasing frankenstein’s monster: information literacy in the black box society,” journal of documentation 75, no. 6 (2019): 1476, https://doi.org/10.1108/jd-022019-0035. 53 head, fister, and macmillan, “information literacy in the age of algorithms,” 42. 54 head, fister, and macmillan, “information literacy in the age of algorithms.” 55 taina bucher, “the algorithmic imaginary: exploring the ordinary affects of facebook algorithms,” information, communication & society 20, no. 1 (2017): 42, https://doi.org/10.1080/1369118x.2016.1154086. 56 tanya kant, making it personal: algorithmic personalization, identify, and everyday life (oxford: oxford university press, 2020), 152. 57 jason clark, lisa janicke hinchliffe, and scott young, “unpacking the algorithms that shape our ux” (washington, dc: imls, 2017), https://www.imls.gov/sites/default/files/grants/re-7217-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf. 58 association of college and university libraries, “framework for information literacy for higher education,” 2015, http://www.ala.org/acrl/standards/ilframework; jason clark, “building competencies around algorithmic awareness” (washington, dc: code4lib, 2018), https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf. 59 jason clark, algorithmic awareness (2018; repr., github, 2020), https://github.com/jasonclark/algorithmic-awareness. 60 ryan cordell, “machine learning + libraries: a report on the state of the field” (washington dc: library of congress, 2020), 31, https://labs.loc.gov/static/labs/work/reports/cordellloc-ml-report.pdf. 61 michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28– 46, https://doi.org/10.29242/rli.299.3. 62 matt turek, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), https://www.darpa.mil/program/explainable-artificial-intelligence; darpa, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 63 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1080/1369118x.2016.1154086 https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf http://www.ala.org/acrl/standards/ilframework https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf https://github.com/jasonclark/algorithmic-awareness https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.29242/rli.299.3 https://www.darpa.mil/program/explainable-artificial-intelligence http://www.darpa.mil/attachments/darpa-baa-16-53.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 15 factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., 2019., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 5–22; alejandro barredo arrieta et al., “explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai,” arxiv:1910.10045 [cs], 2019, http://arxiv.org/abs/1910.10045. 64 r. david lankes, “decoding ai and libraries,” r. david lankes (blog), july 3, 2019, https://davidlankes.org/decoding-ai-and-libraries/. 65 catherine coleman, “artificial intelligence and the library of the future, revisited,” digital library blog (blog), november 3, 2017, https://library.stanford.edu/blogs/digital-libraryblog/2017/11/artificial-intelligence-and-library-future-revisited; head, fister, and macmillan, “information literacy in the age of algorithms”; cordell, “machine learning + libraries”; clark, hinchliffe, and young, “unpacking the algorithms.” 66 cordell, “machine learning + libraries,” 2. 67 thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), 10, https://doi.org/10.25333/xk7z-9g97. 68 nicholas diakopoulos, “algorithmic accountability reporting: on the investigation of black boxes” (new york: tow center for digital journalism, columbia university, 2014), 2, https://doi.org/10.7916/d8zk5tw2; finn, “algorithm of the enlightenment,” 24. 69 scott lash, “power after hegemony: cultural studies in mutation?,” theory, culture & society 24, no. 3 (2007): 71, https://doi.org/10.1177/0263276407075956. https://doi.org/10.1145/3173574.3174156 http://arxiv.org/abs/1910.10045 https://davidlankes.org/decoding-ai-and-libraries/ https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.7916/d8zk5tw2 https://doi.org/10.1177/0263276407075956 abstract introduction algorithms and literacy “literacies of the digital” defining algorithmic literacy why algorithmic literacy? the library role in algorithmic literacy information literacy and explainable ai (xai): unique library contributions algorithmic literacy and information literacy algorithmic literacy and explainable ai (xai) conclusion endnotes migration of a research library's ict-based services to a cloud platform communication migration of a research library’s ict-based services to a cloud platform francis jayakanth, ananda t. byrappa, and filbert minj information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13537 francis jayakanth (francis@iisc.ac.in) is scientific officer, jrd tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, jrd tata memorial library, indian institute of science. filbert minj (filbert@iisc.ac.in) is principal research scientist, supercomputer education and research centre, indian institute of science. © 2022. abstract libraries have been at the forefront in adopting emerging technologies to manage the library’s operations and provide information services to the user community they serve. with the emergence of cloud computing (cc) technology, libraries are exploring and adopting cc service models to make their own services more efficient, reliable, secure, scalable, and cost-effective. in this article, the authors share their experience migrating some of the library’s locally hosted ict-based services onto the microsoft azure cloud platform. the migration of services to a cloud platform has helped the library significantly reduce the downtime of its services due to power or network or system outages. introduction established in 1909, the indian institute of science is a leading advanced education and research institution in the sciences and engineering. since its inception, the institute has balanced an emphasis on pursuing basic knowledge with applying its research findings for industrial and societal benefit. the institute, which started with just two departments—general and applied chemistry and electrical technology—now has over 40 departments spread across six divisions: biological sciences, chemical sciences, electrical sciences, interdisciplinary research, mechanical sciences, and physical and mathematical sciences. the institute’s jrd tata memorial library (https://library.iisc.ac.in) celebrated its centenary in 2011. established in 1911, the library was one of the earliest central facilities created by the institute to support teaching and research. the library offers both conventional and contemporary services to its users. the library’s traditional services include reference, referral, cataloguing and classification, circulation, inter library loan, document delivery, weekly display of recent periodicals and books, and photocopying. some of the library’s current information and communications technology (ict)based services include digital repository services for the institution’s research publications and theses and dissertations, a faculty profiling system, a web-based online public access catalogue (web opac), and shibboleth-based federated access to the library’s subscribed online resources. the library also facilitates information literacy services such as library orientations, workshops, seminars, demonstrations, invited talks, training sessions on subscribed resources, trial access to new products and services, and author workshops on the research publishing process. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:filbert@iisc.ac.in https://library.iisc.ac.in/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 2 until 2018, the library used its on-premises it infrastructure to provide these ict-based services. the library had dedicated computer servers for its email, institutional repository, library website, integrated library management system (lms), and online journal publishing system. the institution’s faculty profiles system is part of the indian research information system (https://irins.org/irins/), a web-based research information management service provided by the information and library network (inflibnet) centre. the library’s in-house servers were ageing, and they were even beginning to fail. also, managing the in-house servers with limited human resources was increasingly challenging. as a result, the library contemplated moving some of its services to a cloud platform. even smaller libraries had begun migrating their services to the cloud platform almost a decade ago.1 around 2016, the institution established the digits (digital campus and information technology services) office to conceive, plan, and create a best-in-class information technology and networking system and support operational excellence through agile it and networking services. to date, the digits office has, among other projects, successfully migrated more than 70 departmental email servers to a centrally managed cloud-based microsoft office 365 suite and developed and migrated the institute’s main website (https://www.iisc.ac.in) and more than 150 websites and 10 web portals of institution departments, centres, and other facilities to the microsoft azure platform. the digits office also creates and maintains virtual machines (vms) on the microsoft azure cloud platform for the institution’s departments and offices. migration of locally hosted it infrastructure to a cloud platform offers several benefits to the organization. these benefits include setting up virtual offices accessible from anywhere and at any time, avoiding capital investment in computing infrastructure, taking advantage of the cloud platform’s elastic computing resources, avoiding the necessity of having a dedicated it team, and, most importantly, minimizing downtime and loss of productivity and data. moreover, a cloud platform offers easy scalability, redundancy, and security. achieving these features in the traditional in-house hosting of computing infrastructure would be cost prohibitive. 2 the library has configured three vms on the azure platform and has moved some of its ict-based services to the cloud platform. migrating ict-based services to the cloud platform has helped the library significantly reduce the downtime of computer servers. cloud computing and its service models cloud computing (cc) refers to computer hardware and software provided as a service by another company. the only requirement to access the cloud computing service is a device with access to the internet. some leading cc service providers include amazon web services (https://aws.amazon.com/what-is-aws/), microsoft azure (https://azure.microsoft.com/en-in/), and google cloud (https://cloud.google.com). there are three service models in cloud computing: software as a service (saas), platform as a service (paas), and infrastructure as a service (iaas). service providers host software applications on their cloud platforms in the saas model. examples of the saas model include google apps (https://workspace.google.com/) and microsoft office 365 (https://www.microsoft.com/enin/microsoft-365). clients opting for the saas model need not worry about installation, setup, running, and maintaining the applications. service providers will do that for the clients. https://irins.org/irins/), https://www.iisc.ac.in/ https://aws.amazon.com/what-is-aws/ https://azure.microsoft.com/en-in/ https://cloud.google.com/ https://workspace.google.com/ https://www.microsoft.com/en-in/microsoft-365 https://www.microsoft.com/en-in/microsoft-365 information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 3 paas provides a computing platform comprising an operating system, database, programming environment, and application programming interface. examples of the paas model include amazon elastic beanstalk (https://aws.amazon.com/elasticbeanstalk/), windows azure (https://azure.microsoft.com/en-in/), and google compute engine (https://cloud.google.com/compute). in the iaas service model, clients can obtain computing infrastructure, virtual machines, networking, and storage components on demand and deliver them over the internet. examples for the iaas model include google compute engine, amazon ec2, and microsoft azure. in coordination with the digits office, the library initially provisioned three vms on the microsoft azure cloud platform to migrate some of its it-based services. table 1 shows the initial hardware configurations of each of the three vms. table 1. vm types and their system configurations along with the cost virtual machine (vm) vm type1 vcpus & ram (gb) cost / month (usd)2 os disk (gb) & type secondary storage disk (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 140 400 (ssd) 600 (ssd) 114 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (ssd) 57 cent 7.x vm3 (website) standard f4s 4, 8 140 300 (ssd) 200 (ssd) 57 ubuntu 18.x 1as of 2018 and subject to change with time. 2cost as prevalent in 2018. 3cost as prevalent in 2018. a virtual machine (vm) is an on-demand and scalable computing resource available on cc platforms. vms give better control over the computing environment without buying any underlying physical hardware. the microsoft azure platform offers various vm options, each optimized for different workloads. for example, the d-series azure vms provide a combination of vcpus (virtual cpus), memory, and temporary storage to meet the requirements associated with most production workloads. categories in the d-series of vms include ds-series, dds-series, and das-series. the f-series vms feature a higher cpu-to-memory ratio, are equipped with 2 gb ram and 16 gb of local solid-state drives (ssds) per cpu core, and are optimized for compute-intensive workloads. f-series vms are costlier than the corresponding d-series vms (https://docs.microsoft.com/en-us/azure/virtualmachines/sizes). for the secondary storage, apart from the standard hard disk drives (hdds), vms support azure premium ssds and ultra-disk storage, depending on regional availability. the premium ssds are designed to support intensive input/output workloads. they are priced almost three times higher than the standard hdds. the standard disk capacity of an azure vm’s os disk is 30 gb, and it can be increased to the desired capacity. apart from the os disk, one can also have a required amount of secondary disk storage. the cost for the additional disk storage (both os and data) is https://aws.amazon.com/elasticbeanstalk/ https://azure.microsoft.com/en-in/ https://cloud.google.com/compute https://docs.microsoft.com/en-us/azure/virtual-machines/sizes https://docs.microsoft.com/en-us/azure/virtual-machines/sizes information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 4 independent of vm pricing, and it depends on storage type and capacity. a vcpu refers to a virtual central processing unit. a vm treats each vcpu as a single physical core. migration of the library’s ict-based services to microsoft azure cloud platform libraries have always been at the forefront of adopting emerging technologies. it is true with cc technology as well. in an interview at the american library association annual meeting in anaheim, california, in june 2012, clifford lynch traced 30 years of interactions between libraries and new technologies.3 with the evolution of cc technologies, libraries have been using cc’s saas and iaas service models since 2009 to host their websites, library management system (lms), and digital repositories. libraries have been using the cc mainly for saas and iaas services.4 as a first step, during mid-2017, the digits office began migrating all the 70+ individual departmental email servers, including the library’s, to a centrally managed, cloud-based mailing solution using office 365 (now microsoft 365) exchange online. after the successful migration of all the email servers, the library shut down its email server. next, the library decided to migrate some of its locally hosted ict-based services to the cloud platform in a phased manner. planning the migration process: considering a single vm or independent vms for each application before undertaking the migration process, libraries need to consider what types of projects are good candidates for the cloud and what types are not.5 in the first phase of the cloud migration, the library decided to migrate the following services: (1) institutional repository services, (2) the library management system, and (3) the library website. before the cloud migration, the library used three independent on-premises servers to host the above services. a sun fire computer server with intel xeon processor, 4 gb of ram, and 2 tb of secondary storage hosted the institutional repository service for research publications using eprints software and the electronic theses and dissertations service using dspace software. the libsys library management system was hosted on an ibm server with intel xeon cpu e5-2620 v2 @ 2.10ghz processor, 16 gb ram, and 1 tb of secondary storage. the library website was hosted on an ibm thinkserver ts150 server with intel xeon cpu e3-1225 v5 @ 3.30ghz processor, 8 gb ram, 1 tb of secondary storage. all three computer servers had been in use for nearly 10 years and were long overdue for replacement. as the library contemplated upgrading its ict infrastructure, provisioning vms on the azure cloud platform through the digits office was a stimulus. next, the library had to decide whether to go with a single vm with robust hardware configuration to host all three applications or to provide independent vms for each service. based on the experience gained from hosting two ir services on a single server, the library decided to go again with a single vm with robust hardware configuration to host the ir services. the lms is a critical application for managing all library functions; therefore, the library decided to host it on a separate vm. a third vm is used to host the library website. as the library eventually plans to move its other ict-based applications to the azure platform, it could migrate and distribute those applications on the existing three vms based on the utilization and load on each of the three vms. initially, the library opted for two vms in the f-series and one in the d-series with premium ssds for all three vms. after observing performance and price for about three months, the library had information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 5 one of the two vms (vm3) moved from f-series to d-series and downgraded the os and data disk types of all the three vms to the standard disk drives. the data disk on vm3, hosting the library website, was dropped as the os disk capacity was more than adequate to run the service. table 2 shows the revised vm types and their configurations. in april 2020, most of our students and faculty members started working off-campus because of the onset of the covid-19 pandemic. to facilitate seamless access to licensed online resources from off-campus, the library set up federated access through shibboleth sso.6 the library provisioned a new virtual machine (vm4) with a system configuration as indicated in table 2. table 2. revised vm types and their system configurations along with the cost virtual machine (vm) vm type4 vcpus & ram (gb) cost/ month (usd)2 os disk (gb) & type secondary storage (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 147 400 (ssd) 600 (hdd) 39 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (hdd) 21 cent 7.x vm3 (website) standard d2s_v3 4, 8 81 300 (hdd) none nil ubuntu 18.x vm4 idp server standard f2s_v2 2, 4 71 300 (hdd) none nil ubuntu 18.x 1as of 2019 and subject to change with time. 2cost as prevalent in 2019. 3cost as prevalent in 2019. the cloud migration process cloud migration is the process of moving applications and data from an organization’s onpremises computers to a cloud platform. before undertaking the migration process, the requisite software applications must be installed and configured on the vms. then, the data corresponding to each application must be backed up on the on-premises system and moved to the corresponding vms. coordination with the campus network support team is essential to ensure the vms are accessible on the internet with all the security measures in place. so, every application migrating to the cloud platform has to go through a cloud migration process. in the following sections, the authors briefly describe the library’s migration process to migrate three of its ict-based services to the azure cloud. the library completed the entire migration process in about three months. migration process for the research publications repository eprints@iisc, the institute’s institutional repository (ir), was established in 2002 and holds nearly 55,000 publications. it is one of the earliest repositories in this part of the world. 7 the ir runs on eprints (https://www.eprints.org/uk/), the world-leading open-source digital repository platform. developed at the university of southampton for over 20 years, eprints has provided stable, innovative repository services across academia and beyond. eprints is a stable, flexible, https://www.eprints.org/uk/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 6 reliable software and ideal for maintaining institutional repositories. before the migration, the library hosted the publications repository on an on-premises server for almost 17 years. eprints software depends on other software, including apache web server with mod_perl (https://httpd.apache.org/), mysql/mariadb (https://mariadb.org/) relational database management system, perl programming language, and several perl modules. eprints software bundles many required perl modules, but some are installed depending on the underlying operating system (os). for example, for the eprints@iisc repository installed on a vm running the cent os, the library has installed apache web server with mod_perl, mariadb relational database management system, and a few missing perl modules. after installing all the dependent software, the library followed the steps listed below to migrate the publications repository to the vm on the azure cloud platform: 1. installed the latest version of the eprints (3.4.1) software on the vm and incorporated all the local customizations done at the configuration and code levels. 2. created a new repository and retained the existing repository name. 3. created a new mariadb database and assigned appropriate grant permissions to the database. 4. as the database structure had changed in the latest version of eprints, executed the necessary scripts built into the eprints software to update the eprints database structure. 5. imported customized institute-specific subject headings to override the default ones. 6. moved the database and full-text backup files to the vm using winscp—an open-source, free ftp client (https://winscp.net/eng/docs/introduction). 7. restored the backups comprising eprints mysql database and full-text files on the vm to the corresponding locations on the file system and uncompressed database and full-text files. 8. imported the mysql database into the new mariadb database on the vm. 9. regenerated all the static pages of the repository, abstracts of all the records, and browse views for the year, author, document type, and subject categories on the vm. 10. enabled hypertext transfer protocol secure (https) for log in and account creation links. 11. configured postfix (http://www.postfix.org/). postfix is a free and open-source mail transfer agent that routes and delivers electronic mail. 12. coordinated with the institute’s network support team to make necessary changes in the dns entries to reflect the vm’s new public ips and enable the vm to send and receive emails. 13. created crontab entries on the vm to run the cron jobs. a cron job is a time-based job scheduler in a unix-like computer operating system. some of the cron jobs include updating the latest records added to the repository, displaying the latest count of records in the repository, and updating the browse views of the repository. migration process for the electronic theses and dissertations repository established in 2005, etd@iisc, the institution’s electronic theses and dissertations repository, is one of the earliest etd repositories in this part of the world. 8 the library uses dspace software (https://duraspace.org/dspace/) to maintain the etd repository. to date, the repository holds nearly 6,000 of the institute’s etds. before the migration, the library hosted the etd repository on an on-premises server for almost 13 years. https://httpd.apache.org/ https://mariadb.org/ https://winscp.net/eng/docs/introduction http://www.postfix.org/ https://duraspace.org/dspace/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 7 dspace software is dependent on several third-party software applications and tools, including java jdk, apache maven (https://maven.apache.org/), apache ant (https://ant.apache.org/), postgresql (https://www.postgresql.org/) or oracle (https://www.oracle.com/in/database) relational database management system, and apache tomcat servlet engine (http://tomcat.apache.org/). the library followed the steps listed below to migrate the etd repository to the vm on the azu re cloud platform: 1. installed all the dspace-dependent software packages and the latest version of dspace software (version 6.3). 2. configured the dspace software to incorporate the native customizations. 3. created communities and collections to reflect the divisions and the corresponding departments and centres of the institute using java script. 4. set the access permission for each collection of the repository based on the users and groups belonging to the collection. 5. modified the metadata of the etds from the on-premises version using a script. the modified metadata was imported into the latest version of dspace. 6. copied and moved etd items comprising pdf files from the on-premises server to the vm. 7. enabled hypertext transfer protocol secure (https) for the etd site. 8. customized the default etd site for a better look and feel. 9. created crontab entries on the vm to run the cron jobs to take incremental backup and display the etd count on the landing page. the new version of the dspace user registration system was modified to enable only people with an institute email id to register with etd@iisc. in addition, the registration process captures the registrant’s department and the division, which helps automate the process of assigning the registrant to a specific collection of the repository. therefore, a user will submit an etd only to the designated collection. migration process for the libsys library management system the library has been using libsys (https://www.libsys.co.in/), a commercial lms, for over 25 years. libsys is dependent on several other software applications, including wildfly application server (https://www.wildfly.org/), java jdk, and mysql (mariadb) relational database management system. the steps involved in migrating libsys to the cloud (vm2) are listed below: 1. installed all the libsys-dependent software components and the latest version of the libsys software on the vm. 2. restored the mariadb database backup. 3. made required changes in the libsys configuration files. 4. installed and configured postfix email transfer agent for email communication. 5. as the libsys service and the opac run on nonstandard ports, the library coordinated with the network support team to open the required communication ports on the vm. https://maven.apache.org/ https://ant.apache.org/ https://www.postgresql.org/ https://www.oracle.com/in/database http://tomcat.apache.org/ https://www.libsys.co.in/ https://www.wildfly.org/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 8 migration process for the library website the library uses drupal (https://www.drupal.org/), a content management system, to host its website. the steps involved in the migration process are listed below: 1. installed all the drupal-dependent software, including apache web server, mariadb, and php (https://www.php.net/), on the vm. 2. installed one of the latest versions of drupal using its web-based installer. 3. installed the required drupal plugins and the drupal theme. 4. restored drupal database backup on the vm. 5. installed and configured postfix email transfer agent for email communication. after completing migration processes, the library coordinated with the network support team to make changes in the domain name system (dns) to enable access to all the three vms on the internet. monitoring the azure virtual machines azure monitor (am) for vms includes a set of performance charts that target several key performance indicators to determine how well a virtual machine performs. the graphs show resource utilization over a period to identify bottlenecks and anomalies or switch to a perspective listing of each vm to view resource utilization based on the metric selected. while there are numerous elements to consider when dealing with performance, azure monitor for vms monitors critical operating system performance indicators related to the processor, memory, network adapter, and disk utilization. performance complements the health monitoring feature and helps expose issues that indicate a possible system component failure, support tuning and optimization to achieve efficiency, or support capacity planning (https://docs.microsoft.com/enus/azure/azure-monitor/insights/vminsights-performance). am is accessible only to the cloud administrator. based on the am charts, the library’s inference has been that the ir server (vm1) hosting publications and etd repositories needs capacity planning. cpu utilization is reaching maximum capacity quite frequently. therefore, the library plans to move the etd repository to an independent vm. the utilization of the ilms server (vm2) is less than optimal. therefore, the library decided to migrate publication of the institution’s journal of the indian institute of science (jiisc) from onpremises hosting to the ilms server (vm2) on the azure cloud. for hosting jiisc on the azure cloud platform, the library uses the open journal system (ojs) platform (https://pkp.sfu.ca/ojs/). ojs is open-source software for the management of peerreviewed academic journals. ojs is dependent on other software and tools, including apache web server, mysql or postgresql, and php. the library used the virtual hosting concept to host multiple sites on a single vm (vm2). virtual hosting is a method of hosting multiple domain names on a single server. https://www.drupal.org/ https://www.php.net/ https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://pkp.sfu.ca/ojs/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 9 benefits observed of migrating to the cloud moving some of the ict-based services of the library to the azure cloud platform has resulted in the following benefits: 1. service reliability has improved significantly as the reliance on the in-house ageing servers has been done away with. 2. cloud migration has made the library’s computing infrastructure more flexible. it can be scaled up or down as per the library’s requirement. 3. operational logs and usage metrics are easy to obtain. 4. set alert rules are based on vm metrics. 5. the cloud hosting company’s managed services include periodic backups, ensuring that data is secure. 6. users can now quickly move between the library and home (or any other location) and access all their research. 7. another significant benefit to cloud computing for library users is sharing information easily. libraries provide collaborative spaces within the building, but patrons also can use collaborative online spaces. onedrive provides access to online storage and allows sharing of files and folders among approved users. 9 lessons learned during and after the migration process initially, the library opted for two vms on azure’s f-series and one in the d-series with the ssd storage devices for all three vms. as stated above, the pricing of the azure vms depends on the hardware configuration of a specific type of vm series. for example, the f-series with a particular hardware configuration and ssd costs more than its counterparts on the dor b-series with the same hardware configuration with a standard hdd. libraries should, therefore, have a clear understanding of the vm types and the corresponding costing aspect. after observing vms performance and the cost aspect for a few months, the library moved one of the two vms from the f-series to the d-series and switched to standard hard disk drives for all three vms. the changeover did not result in any performance degradation, but the cost of the secondary storage came down by about one-third. the library maintains two institutional repositories, one for research publications using the eprints application and the other for theses and dissertations using the dspace application. the library decided to migrate both the repositories to a single vm. however, it turns out that this decision was not a prudent one, for the tomcat server running eprints crashes, often resulting in downtime for the etd service. the vm usage metrics reveal that the dspace application often utilizes nearly 100% of cpu capacity, which leads to the freezing of the tomcat server, resulting in etd service becoming unresponsive. the library contemplates upgrading the hardware configuration or setting up the two repository applications on two vms. the library is checking to understand if the issue is with the tomcat server configurations. the graph shown in figure 1 is the screenshot of azure’s metric monitoring of the vm1 running eprints and dspace applications. the peaks in the graph represent the cpu usage by the dspace application. it is evident from the graph that quite frequently, the cpu usage of the dspace application is reaching 100%, which eventually leads to the freezing of the tomcat web server. information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 10 figure 1. screenshot representing the cpu usage of eprints and dspace applications. the professional library staff administered and managed the on-premises ict-based services of the library. therefore, the library did not encounter any technical challenges during and after the cloud migration process. other libraries that intend to migrate their services to a cloud platform should ensure that the library staff entrusted with the migration process are comfortable working with the command prompt, especially in the linux operating system environment. conclusions it has been more than two years since some of the library’s ict-based services migrated to the microsoft azure platform. to date, the library has not experienced a single instance of its servers being down because of a power outage, network issue, or crashing of the vms. however, there have been issues with specific services like the apache tomcat servlet engine or the apache web server crashing, resulting in the corresponding application being unresponsive. such behaviour can result because, at times, the system resources, especially the cpu and ram, may be used to their capacity. restarting the specific service will ensure that the corresponding application comes up. therefore, it is essential to keep track of the ram and cpu usage of the vms and upgrade them if the situation warrants such an action. the rapid elasticity characteristic of cloud computing facilitates organizations to configure optimal computing resources based on actual requirements. based on the library’s initial experience running the ict-based services on a cloud platform, the authors suggest that deploying two different institutional repository software platforms like eprints and dspace on a single vm may not be a good idea. the tomcat instance powering the dspace site runs with higher cpu usage, swinging up to 100% and at times going beyond 100% cpu usage. the high cpu usage by the tomcat instance can eventually lead to its freezing, resulting in the corresponding service being inaccessible. working with the vms demands some degree of familiarity in working in the command prompt. library staff who are not comfortable working in the command prompt will require additional training in getting used to the vm environment. in our case, the training was not necessary as the authors had adequate experience working in the linux operating system. also, library staff managing the cloud infrastructure need to coordinate with the organization’s networking and information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 11 email support staff to configure the vms to be accessible on the internet, email communications , and enforce security measures. acknowledgements the authors would like to thank the editor and the referees for their insightful comments and suggestions. endnotes 1 robin r. hartman, “life in the cloud: a worldshare management services case study,” journal of web librarianship 6, no. 3 (2012): 176–85, https://doi.org/10.1080/19322909.2012.702612. 2 erik t. mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86, https://doi.org/10.1080/19322900903565259. 3 clifford lynch, elke greifeneder, and michael seadle, “interactions between libraries and technology over the past 30 years,” library hi tech 30, no. 4 (2012): 565–78, https://doi.org/10.1108/07378831211285059. 4 yan han, “iaas cloud computing services for libraries: cloud storage and virtual machines,” oclc systems and services 29, no. 2 (2013): 87–100, https://doi.org/10.1108/10650751311319296. 5 denis galvin and mang sun, “avoiding the death zone: choosing and running a library project in the cloud,” library hi tech 30, no. 3 (2012): 418–27, https://doi.org/10.1108/07378831211266564. 6 francis jayakanth, ananda t. byrappa, and raja vishvanathan, “off-campus access to licensed online resources through shibboleth,” information technology and libraries 40, no. 2 (2021), https://doi.org/10.6017/ital.v40i2.12589. 7 francis jayakanth et al., “eprints@iisc: india’s first and fastest-growing institutional repository,” oclc systems and services: international digital library perspectives 24, no. 1 (2008): 59–70, https://doi.org/10.1108/10650750810847260. 8 jobish pitchet, filbert minj, and tarikere basappa rajashekar, “etd@iisc.: a dspace-based etdms and oai compliant theses repository service of indian institute of science,” in etd 2005: evolution through discovery, 8th international symposium on electronic theses and dissertations, 28–30 september 2005 (sydney, australia: the university of new south wales). 9 tom ipri, “where the cloud meets the commons,” journal of web librarianship 5, no. 2 (2011): 132–41, https://doi.org/10.1080/19322909.2011.573295. https://doi.org/10.1080/19322909.2012.702612 https://doi.org/10.1080/19322900903565259 https://doi.org/10.1108/07378831211285059 https://doi.org/10.1108/10650751311319296 https://doi.org/10.1108/07378831211266564 https://doi.org/10.6017/ital.v40i2.12589 https://doi.org/10.1108/10650750810847260 https://doi.org/10.1080/19322909.2011.573295 abstract introduction cloud computing and its service models migration of the library’s ict-based services to microsoft azure cloud platform planning the migration process: considering a single vm or independent vms for each application the cloud migration process migration process for the research publications repository migration process for the electronic theses and dissertations repository migration process for the libsys library management system migration process for the library website monitoring the azure virtual machines benefits observed of migrating to the cloud lessons learned during and after the migration process conclusions endnotes social contexts of new media literacy: mapping libraries elizabeth thorne-wallington information technology and libraries | december 2013 53 abstract this paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the st. louis, missouri, metropolitan area. framed around the issue of universal access to internet, computers, and technology (ict) for digital natives, this paper demonstrates patterns of library location related to race and income. this research then raises important questions about library location, and, in turn, how this impacts access to ict for young people in the community. objectives and purpose the development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 young people today develop literacy through a variety of new media and digital technologies.2 the dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. ernest morrell, literacy researcher, writes, as english educators, we have a major responsibility to help future english teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 this paper will explore how mapping and geographic information systems (gis) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. the objective of this paper is to demonstrate how mapping and gis can be used to provide rigorous analysis of how library location in st. louis, missouri, is correlated with socioeconomic factors defined by the us census including median household income and race. by using gis, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. this analysis raises new questions about how libraries are distributed across the st. louis area and whether they truly provide universal and equal access. elizabeth thorne-wallington (ethornew@wustl.edu) is a doctoral student in the department of education at washington university in st. louis. mailto:ethornew@wustl.edu information technology and libraries | december 2013 54 literature review advances in technologies are transforming the very meaning of literacy.5 traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 the changing global economy requires a variety of digital literacies, which schools do not provide.7 instead, young people acquire literacy through a multitude of inand out-of-school experiences with new media and digital technology.8 libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. to understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. first is the concept of the digital native. those born around 1980, who have essentially grown up with technology, are known as digital natives.9 digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. the suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. key to any discussion of digital natives is the concept of the digital divide. the digital divide has been a central issue of education policy since the mid-1990s.10 early work on the digital divide was concerned primarily with equal access.11 more recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 hargattai asserts that even among digital natives, there are large variations in internet skills and uses correlated with socioeconomic status, race, and gender.13 these variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-ofschool contexts. the concept of literacy and learning in out-of-school contexts has a strong historical context. hull and schultz provide a review of the theory and research on literacy in out-of-school settings.14 a variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. lankshear and knobel examine out-of-school practices extensively with their work on new literacies.15 lankshear and knobel also make clear the complexity of out-of-school experiences among young people. students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. most importantly, lankshear and knobel found that the students did connect what they learned in the classroom with these out-of-school activities. the connection between out-of-school literacies and in-school learning has also been studied. education policy researcher allan luke writes, the redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses social contexts of new media literacies: mapping libraries| thorne-wallington 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 collins writes about this relationship between inand out-of-school literacies. collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. that is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 skerett and bomer make this connection even more explicit when looking at adolescent literacy practices.18 their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. the authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. however, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. still, it is clear that connections between inand out-of-school literacies can be made. the role libraries play in making this connection has not been studied as extensively. yet it is clear that young people do use libraries to access technology. becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” among those below the poverty line, 61 percent used public library computers and the internet for educational purposes.19 tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 tripp writes that libraries should capitalize on the potential of new media to engage young people. additionally, tripp argues that librarians need to develop skills to train young people to use new media. the idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the john d. and catherine t. macarthur foundation to build “innovative learning labs for teens” in libraries. this grant making was a response to president obama’s “educate to innovate” campaign, a nationwide effort to bring american students to the forefront in science and math.21 this literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. this literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. the study provided here uses gis analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. first, i describe the role of gis in understanding context. next, i describe the methods used in this paper. finally, i analyze the results and implications for the study. geographic information systems analysis in education there is a burgeoning body of research which uses geographic information systems (gis) to better understand socioeconomic and cultural contexts of education and literacy issues.22 information technology and libraries | december 2013 56 there are several key works that link geography and social context. lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. he describes space as a tool for thought and action or as a means of control and domination. lefebvre writes that there is a need for spatial reappropriation in everyday urban life. the struggle for equality, then, is central to the “right of the city.”23 the unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. this unequal distribution of resources continues today. de souza briggs and others write that there is clear geographical segregation in american cities today.24 this is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. in the conclusion of his book, de souza briggs writes that housing choice is limited for low-ses minorities, and these limitations produce myriad social effects. again, this finding is important to the contexts of where libraries are located. jargowsky writes of similar findings.25 like de souza briggs, jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. soja goes beyond the geographic analysis offered by de souza briggs and jargowsky and writes that space should be applied to contemporary social theory.26 soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. he writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. this understanding, then, shapes how i approach the study of new media literacies as influenced by cultural and social factors. these factors are particularly prevalent in the st. louis, missouri, area. colin gordon reiterates the arguments of lefbvre jargowsky and de souza briggs in arguing that st. louis is a city in decline.27 by providing maps that project housing policies, gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. gordon is able to show that vast populations are moving out of st. louis city and into the county, resulting in a concentration of minority populations in the northern part of the city. gordon argues that the policies and programs offered by st. louis city have only exacerbated the problem and led to greater blight.28 in terms of literacy, morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 the significance of this is that gis can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. social contexts of new media literacies: mapping libraries| thorne-wallington 57 methods and data the gis analysis performed here concerns library locations in the st. louis metropolitan area, including st. louis city and st. louis county. the st. louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. this segregation is striking when displayed geographically and illuminating when mapped with library location. maps were created using tiger files (www.census.gov/geo/maps-data/data/tiger.html) and us census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via internet download. libraries were identified using the st. louis city library’s “libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the st. louis county library “locations & hours” webpage (www.slcl.org/about/hours_and_locations), google maps (www.maps.google.com), and the yellow pages for the st. louis metropolitan area (www.yellowpages.com). the address of each library was entered into itouchmap (http://itouchmap.com ) to indentify the latitude and longitude of the library. a spreadsheet containing this information was then loaded into the gis software and displayed as x–y data. the maps were then displayed using median household income, african american population, and latino and hispanic population as obtained from the us census at census tract level. for median household income, the data was from 1999. for all other census data, the year was 2010. for district-level data, communication arts data from the missouri department of elementary and secondary education (modese) website (http://dese.mo.gov/dsm ), was entered into microsoft excel, and then displayed on the maps. the data is district level, representing all grades tested for communication arts across all district schools. the modese data was from 2008, the most recent year available at the time the analysis was performed. the communication arts data was taken from the missouri assessment program test. this test is given yearly across the state to all public school students. the state then collects the data and makes it available at the state, district, and school level. the data used here is district-level data. scores are broken into four categories: advanced, proficient, basic, and below basic. the groups for proficient and advanced were combined to indicate the district’s success on the map test. these are the two levels generally considered acceptable or passing by the state.30 before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using esri arcgis software, version 9.0, to analyze whether clustering was statistically significant. this analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. the nearest neighbor tool of arcgis was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. this was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. the tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. then an average distance was calculated for these features and compared to the real data. that is, a hypothetical random set of locations was compared to the set of actual library locations. a near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 this score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm information technology and libraries | december 2013 58 results and conclusions using the nearest neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. this means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. figure 1 shows library location and population of individuals under the age of 18 at the census tract level for st. louis city and county, using data from the 2010 us census. to clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 this map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in st. louis city and county. in fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. this is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. figure 1. number of individuals under the age of 18 by census tract and library location in st. louis city and st. louis county. source: 2010 us census. social contexts of new media literacies: mapping libraries| thorne-wallington 59 figure 2 includes maps showing library locations in st. louis city and county in terms of poverty and race by census tract level, as well as act score by district, represented by the bold lines, where st. louis city is represented by a single district, the st. louis public school district. median household income in indicated by the gray shading, with white areas not having data available. first, census tracts with low median household income are clustered in the northern part of the city and county. there are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. there are fewer libraries in the census tracts with low median household income. figure 2. median household income, act score, and library location, st. louis city and county. source: 2010 us census and missouri department of elementary and secondary education, 2010, www.modese.gov. while the nearest neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. this is especially concerning given the report by becker that 61 percent of those living below the poverty line use libraries to access the internet.34 first, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. while the libraries appear to be http://www.modese.gov/ information technology and libraries | december 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. adding to the concern of location is that of access to these library locations. for those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on act scores, but there are clearly higher act scores in wealthier areas of the city and county. this is not to say that there is a statistical relationship between act score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. figure 3 shows library location by race, including african american or black and hispanic or latino. first, it is important to note that patterns of race in st. louis have been carefully documented by gordon.35 the st. louis area is clearly a highly segregated region, which makes the social contexts of libraries in the st. louis area even more important. this map demonstrates that while there are many libraries in the northern parts of st. louis city and county, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as african american or black in either the city or county. this raises questions about the inequality of access to the libraries. on the other hand, the densest populations of those identifying themselves as hispanic or latino are in the southern part of the city, but not the county. there is a library located in one of those tracts. it appears the areas with higher concentrations of african americans or blacks have fewer libraries, while areas with the higher concentrations of latinos or hispanics are located in the southern parts of the city that do have libraries. it is important to note, however, that the concentrations of latinos and hispanics is quite low, and those areas are majority white census tracts. as noted above, beyond location, access from public transportation is also an important issue. at the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. libraries are not located in areas with low median household income or in areas with high concentrations of african americans or blacks. this raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. social contexts of new media literacies: mapping libraries| thorne-wallington 61 figure 3. african american or black and hispanic, library location, st. louis city and county. source: 2010 us census. the final map raises a slightly different issue, one of test scores and student achievement. figure 4 shows library location by percent proficient or advanced on the missouri achievement program test by district. beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as african american or black. here an interesting pattern emerges. while there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). on the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. this suggests that there may not be a strong connection between achievement on the map exam and library location, similar to the lack of relationship seen in between act average score and library location in figure 2. at the same time, there does appear to be a correlation between race, income, and test scores. this correlation is noted throughout the literature on student achievement.36 clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. what is made clear by the maps, though, is that gis can be used as a tool to help understand the context of new media literacy. information technology and libraries | december 2013 62 figure 4. proficient or advanced, communication arts map by district, 2009, and library location. source: missouri department of elementary and secondary education, 2010, www.modese.gov. significance these results demonstrate that gis can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. this can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. paired with traditional qualitative and quantitative work, gis can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. for the results of this study, there does appear to be a relationship between library location and race and income. this study illuminates the complex contextual factors affecting libraries. because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ social contexts of new media literacies: mapping libraries| thorne-wallington 63 references 1. ernest morrell, “critical approaches to media in urban english language arts teacher development,” action in teacher education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. mizuko ito et al., hanging out, messing around, and geeking out: kids living and learning with new media (cambridge: mit press/macarthur foundation, 2010). 3. morrell, “critical approaches to media in urban english language arts teacher development.” 4. lisa tripp, “digital youth, libraries, and new media literacy,” reference librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. gunther kress, literacy in the new media age (london: routledge, 2003). 6. ibid. 7. donna e. alvermann and alison h. heron, “literacy identity work: playing to learn with popular media,” journal of adolescent & adult literacy 45, no. 2 (2001): 118–22. 8. colin lankshear and michele knobel, new literacies: everyday practices and classroom learning (maidenshead: open university press, 2006). 9. john palfrey and urs gasser, born digital: understanding the first generation of digital natives (new york: perseus, 2009). 10. karin m. wiburg, “technology and the new meaning of educational equity,” computers in the schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/j025v20n01_09. 11. rob kling, “learning about information technologies and social change: the contribution of social informatics,” information society 16, no. 3 (2000): 212–24. 12. james r. valadez and richard p. durán, “redefining the digital divide: beyond access to computers and the internet,” high school journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. eszter hargittai, “digital na(t)ives? variation in internet skills and uses among members of the ‘net generation,’” sociological inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475682x.2009.00317.x. 14. glynda hull and katherine schultz, “literacy and learning out of school: a review of theory and research,” review of educational research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. colin lankshear and michele knobel, new literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 information technology and libraries | december 2013 64 16. allan luke, “literacy and the other: a sociological approach to literacy research and policy in multilingual societies,” reading research quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. stephanie collins, “breadth and depth, imports and exports: transactions between the in-and out-of-school literacy practices of an ‘at risk’ youth,” in cultural practices of literacy: case studies of language, literacy, social practice, and power (mahwah, nj: lawrence erlbaum, 2007). 18. allison skerrett and randy bomer, “borderzones in adolescents literacy practices: connecting out-of-school literacies to the reading curriculum,” urban education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. samantha becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 20. lisa tripp, “digital youth, libraries, and new media literacy.” 21. nora fleming, “museums and libraries awarded $1.2m to build learning labs,” education week (blog), december 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. see william f. tate iv and mark hogrebe, “from visuals to vision: using gis to inform civic dialogue about african american males,” race ethnicity and education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; mark c. hogrebe and william f. tate iv, “school composition and context factors that moderate and predict 10th-grade science proficiency,” teachers college record 112, no. 4 (2010), 1096–1136; robert j. sampson, great american city: chicago and the enduring neighborhood effect (chicago: university of chicago press, 2012). 23. henri lefebvre, the production of space (oxford: blackwell, 1991). 24. xavier de souza briggs, the georgraphy of opportunity: race and housing choice in metropolitan america (washington, dc: brookings institute press, 2005). 25 paul jargowsky, poverty and place: ghettos, barrios, and the american city (new york: russell sage foundation, 1997). 26. edward w. soja, postmodern geographies: the reassertion of space in critical social theory (new york: verso, 1989). 27. collin gordon, mapping decline: st. louis and the fate of the american city (university of pennsylvania press, 2008). 28. ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html social contexts of new media literacies: mapping libraries| thorne-wallington 65 29. ernest morrell, “critical approaches to media in urban english language arts teacher development.” 30. missouri department of elementary and secondary education, http://dese.mo.gov/dsm/. 31. david allen, gis tutorial ii: spatial analysis workbook (redlands, ca: esri press, 2009). 32. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 33. lisa tripp, “digital youth, libraries, and new media literacy.” 34. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 35. collin gordon, mapping decline: st. louis and the fate of the american city. 36. see mwalimu shujaa, beyond desegregation: the politics of quality in african american schooling (thousand oaks, ca: corwin, 1996); william j. wilson, the truly disadvantaged: the inner city, the underclass, and public policy (chicago: university of chicago press, 1987); gary orfield and mindy l. kornhaber, raising standards or raising barriers: inequality and highstakes testing in public education (new york: century foundation, 2010). http://dese.mo.gov/dsm/ copyright: regulation out of line with our digital reality? abigail j. mcdermott information technology and libraries | march 2012 7 abstract this paper provides a brief overview of the current state of copyright law in the united states, focusing on the negative impacts of these policies on libraries and patrons. the article discusses four challenges current copyright law presents to libraries and the public in general, highlighting three concrete ways intellectual property law interferes with digital library services and systems. finally, the author suggests that a greater emphasis on copyright literacy and a commitment among the library community to advocate for fairer policies is vital to correcting the imbalance between the interests of the public and those of copyright holders. introduction in july 2010, the library community applauded when librarian of congress james h. billington announced new exemptions to the digital millennium copyright act (dmca). those with visual disabilities and the librarians who serve them can now circumvent digital rights management (drm) software on e-books to activate a read-aloud function.1 in addition, higher education faculty in departments other than film and media studies can now break through drm software to include high-resolution film clips in class materials and lectures. however, their students cannot, since only those who are pursuing a degree in film can legally do the same.2 that means that english students who want to legally include high-resolution clips from the critically acclaimed film sense and sensibility in their final projects on jane austin’s novel will have to wait another three years, when the librarian of congress will again review the dmca. the fact that these new exemptions to the dmca were a cause for celebration is one indicator of the imbalanced state of the copyright regulations that control creative intellectual property in this country. as the consumer-advocacy group public knowledge asserted, “we continue to be disappointed that the copyright office under the digital millennium copyright act can grant extremely limited exemptions and only every three years. this state of affairs is an indication that the law needs to be changed.”3 this paper provides a brief overview of the current state of u.s. copyright law, especially developments during the past fifteen years, with a focus on the negative impact these policies have had and will continue to have on libraries, librarians, and the patrons they serve. this paper does not provide a comprehensive and impartial primer on copyright law, a complex abigail j. mcdermott (ajmcderm@umd.edu) is graduate research associate, the information policy and access center (ipac), and masters candidate in library science, university of maryland, college park. mailto:ajmcderm@umd.edu copyright: regulation out of line with our digital reality | mcdermott 8 and convoluted topic, instead identifying concerns about the effects an out-of-balance intellectual property system is having on the library profession, library services, and creative expression in our digital age. as with any area of public policy, the battles over intellectual property issues create an every fluctuating copyright environment, and therefore, this article is written to be current with policy developments as of october 2011. finally, this paper recommends that librarians seek to better educate themselves about copyright law, and some innovative responses to an overly restrictive system, so that we can effectively advocate on our own behalf, and better serve our patrons. the state of u.s. copyright law copyright law is a response to what is known as the “progress clause” of the constitution, which charges congress with the responsibility “to promote the progress of science and the useful arts . . . to this end, copyright assures authors the right to their original expression, but encourages others to build freely upon the ideas and information conveyed by a work.”4 fair use, a statutory exception to u.s. copyright law, is a complex subject, but a brief examination of the principle gets to the heart of copyright law itself. when determining fair use, courts consider 1. the purpose and character of the use; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for the copyrighted work.5 while fair use is an “affirmative defense” to copyright infringement,6 invoking fair use is not the same as admitting to copyright infringement. teaching, scholarship, and research, as well as instances in which the use is not-for-profit and noncommercial, are all legitimate examples of fair use, even if fair use is determined on a case-by-case basis.7 despite the byzantine nature of copyright law, there are four key issues that present the greatest challenges and obstacles to librarians and people in general: the effect of the dmca on the principle of fair use; the dramatic extension of copyright terms codified by the sonny bono copyright term extension act; the disappearance of the registration requirement for copyright holders; and the problem of orphan works. the digital millennium copyright act (dmca) the dmca has been controversial since its passage in 1998. title i of the dmca implements two 1996 world intellectual property organization (wipo) treaties that obligate member states to enforce laws that make tampering with drm software illegal. the dmca added chapter 12 to the u.s. copyright act (17 u.s.c. §§ 1201–1205), and it criminalized the trafficking of “technologies designed to circumvent access control devices protecting copyrighted material from unauthorized information technology and libraries | march 2012 9 copying or use.”8 while film studios, e-book publishers, and record producers have the right to protect their intellectual property from illegal pirating, the dmca struck a serious blow to the principle of fair use, placing librarians and others who could likely claim fair use when copying a dvd or pdf file in a catch-22 scenario. while the act of copying the file may be legal according to fair use, breaking through any drm technology that prevents that copying is now illegal.9 the sonny bono copyright term extension act while the copyright act of 1790 only provided authors and publishers with twenty-eight years of copyright protection, the sonny bono copyright term extension act of 1998 increased the copyright terms of all copyrighted works that were eligible for renewal in 1998 to ninety-five years after the year of the creator’s death. in addition, all works copyrighted on or after january 1, 1978, now receive copyright protection for the life of the creator plus seventy years (or ninety-five years from the date of publication for works produced by multiple creators).10 jack valenti, former president of the motion picture association of american, was not successful in pushing copyright law past the bounds of the constitution, which mandates that copyright be limited, although he did try to circumvent this constitutional requirement by suggesting that copyright terms last forever less one day.11 the era of automatic copyright registration perhaps the most problematic facet of modern u.s. copyright law appears at first glance to be the most innocuous. the copyright act of 1976 did away with the registration requirement established by the copyright act of 1790.12 that means that any creative work “fixed in any tangible medium of expression” is automatically copyrighted at the moment of its creation.13 that includes family vacation photos stored on a computer hard drive; they are copyrighted and your permission is required to use them. the previous requirement of registration meant authors and creators had to actively register their works, so anything that was not registered entered the public domain, replenishing that important cultural realm.14 now that copyright attaches at the moment an idea is expressed through a cocktail napkin doodle or an outline, virtually nothing new enters the public domain until its copyright term expires—at least seventy years later. in fact, nothing new will enter the public domain through copyright expiration until 2019. until then, the public domain is essentially frozen in the year 1922.15 the problem of orphan works in addition, the incredibly long copyright terms that apply to all books, photographs, and sound recordings have created the problem of orphan works. orphan works are those works that are under copyright protection, but whose owners are difficult or impossible to locate, often due to death.16 these publications are problematic for researchers, librarians, and the public in general: orphan works are perceived to be inaccessible because of the risk of infringement liability that a user might incur if and when a copyright owner subsequently appears. consequently, many works that are, copyright: regulation out of line with our digital reality | mcdermott 10 in fact, abandoned by owners are withheld from public view and circulation because of uncertainty about the owner and the risk of liability.17 if copyright expired with the death of the author, or if there were a clause that would allow these works to pass into the public domain if the copyright holder’s heirs did not actively renew copyright for another term, then these materials would be far less likely to fall into legal limbo. currently, many are protected despite the fact that acquiring permission to use them is all but impossible. a study of orphan works in the collections of united kingdom public sector institutions found that these works are likely to have little commercial value, but high “academic and cultural significance,” and when contacted, these difficult-to-trace rights holders often grant permission for reproduction without asking for compensation.18 put another way, orphan works are essentially “locking up culture and other public sector content and preventing organizations from serving the public interest.”19 the row that arose in september 2011 between the hathitrust institutions and the authors guild over the university of michigan’s orphan works digitization project, with j. r. salamanca’s longout-of-print 1958 novel the lost country serving as the pivot point in the dispute, is an example of the orphan works problem. the fact that university of michigan associate university librarian john price wilkin was forced to assure the public that “no copyrighted books were made accessible to any students” illustrates the absurdity in arguing over whether it’s right to digitize books that are no longer even accessible in their printed form.20 libraries, digitization, and copyright law: the quiet crisis while one can debate if u.s. copyright law is still oriented toward the public good, the more relevant question in this context is the effect copyright law has on the library profession. drm technology can get in the way of serving library patrons with visual disabilities and every library needs to place a copyright disclaimer on the photocopiers, but how much more of a stumbling block is intellectual property law to librarians in general, and the advance of library systems and technology in particular? the answer is undeniably that current u.s. copyright legislation places obstacles in the way of librarians working in all types of libraries. while there are many ways that copyright law affects library services and collections in this digital area, three challenges are particularly pressing: the problem of ownership and licensing of digital content or collections; the librarian as de facto copyright expert; and copyright law as it relates to library digitization programs generally, and the google book settlement in particular. digital collections: licenses replace ownership in the past, people bought a book, and they owned that copy. there was little they could accidentally or unknowingly do to infringe on the copyright holder’s rights. likewise, when physical collections were their only concern, librarians could rely on sections 108 and 109 of the copyright law to protect them from liability when they copied a book or other work and when they loaned materials in their collections to patrons.21 today, we live partly in the physical world and https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote19#footnote19 information technology and libraries | march 2012 11 partly in the digital world, reaching out and connecting to each other across fiber optic lines in the same way we once did around the water cooler. likewise, the digital means of production are widely distributed. in a multimedia world, where sharing an informative or entertaining video clip is as easy as embedding a link onto someone’s facebook wall, the temptation to infringe on rights by distributing, reproducing, or displaying a creative work is all too common, and all too easy.22 many librarians believe that disclaimers on public-access computer terminals will protect them from lawsuit, but they do not often consider placing such disclaimers on their cd or dvd collections. yet a copyright holder would not have to prove the library is aware of piracy to accuse the library of vicarious infringement of copyright. the copyright holder may even be able to argue that the library sees some financial gain from this piracy if the existence of the material that is being pirated serves as the primary reason a patron visits the library.23 even the physical cd collection in the public library can place the institution in danger of copyright infringement; yet the copyright challenges raised by cutting-edge digital resources, like e-books, are undoubtedly more complicated. e-books are replacing traditional books in many contexts. like most digital works today, e-books are licensed, not purchased outright. the problem licensing presents to libraries is that licensed works are not sold, they are granted through contracts, and contracts can change suddenly and negate fair-use provisions of u.s. copyright law.24 while libraries are now adept at negotiating contracts with subscription database providers, e-books are in many ways even more difficult to manage, with many vendors requiring that patrons delete or destroy the licensed content on their personal e-readers at the end of the lending period.25 the entire library community was rocked by harpercollins’s february 2011 decision to limit licenses on e-books offered through library ebook vendors like overdrive to twenty-six circulations, with many librarians questioning the publisher’s assertion that this seemingly arbitrary limitation is related to the average lifespan of a single print copy.26 license holders have an easy time arguing that any use of their content without paying fees is a violation of their copyright. that is not the case when a fair use argument is justified, and while many in the library community may acquiesce to these arguments, “in recent cases, courts have found the use of a work to be fair despite the existence of a licensing market.”27 when license agreements are paired with drm technology, libraries may find themselves managing thousands of micropayments to allow their users to view, copy, move, print, or embed, for example, the pdf of a scholarly journal article.28 in the current climate of reduced staff and shrinking budgets, managing these complex licensing agreements has the potential to cripple many libraries. the librarian as accidental copyright czar during a special libraries association (sla) q&a session on copyright law in the digital age, the questions submitted to the panel came from librarians working in hospitals, public libraries, academic libraries, and even law libraries. librarians are being thrust into the position of de facto copyright expert. one of the speakers mentioned that she must constantly remind the lawyers at copyright: regulation out of line with our digital reality | mcdermott 12 the firm she works for that they should not copy and paste the full text of news or law review journal articles into their e-mails, and instead, they should send a link. the basis of her argument is the third factor of fair use mentioned earlier: the amount or substantiality of the portion of the copyrighted work being used.29 since fair use is not a “bright line” principle, the more factors you have on your side the better when you are using a copyrighted work without the owners express permission.30 librarians working in any institution must seek express permission from copyright holders for any video they wish to post, or embed, on library-managed websites. e-reserves and streaming video, mainstays of many educators and librarians seeking to capture the attention of this digital generation, have become bright red targets for litigious copyright holders who want to shrink the territory claimed under the fair-use banner even further. many in the library community are aware of the georgia state university e-reserves lawsuit, cambridge university press et al. v. patton, in which a group of academic publishers have accused the school of turning its e-reserves system into a vehicle for intentional piracy.31 university librarians are implicated for not providing sufficient oversight. it has come to light that the association of american publishers (aap) approached other schools, including cornell, hofstra, syracuse, and marquette, before filing a suit against georgia state. generally, the letters come from aaps outside counsel and are accompanied by “the draft of a federal court legal complaint that alleges copyright infringement.”32 the aap believes that e-reserves are by nature an infringement of copyright law, so they demand these universities work with their association to draft guidelines for electronic content that support aaps “cost-per-click theory of contemporary copyright: no pay equals no click.”33 it seems that georgia state was not willing to quietly concede to aap’s view on the matter, and they are now facing the association in court.34 a decision in this case was pending at the time this article went to press. the case brought by the association for information and media equipment (aime) against ucla is similar, except it focuses on the posting of videos so they can be streamed by students on password-protected university websites that do not allow the copying or retention of the videos.35 ucla argued that the video streaming services for students are protected by the technology education and copyright harmonization (teach) act of 2002, which is the same act that allows all libraries to offer patrons online access to electronic subscription databases off-site through a user-authentication system.36 in addition, ucla argues that it is simply allowing its students to “time shift” these videos, a practice deemed not to infringe on copyright law by the supreme court in its landmark sony corp. v. universal city studios, inc. decision of 1984.37 the american library association (ala), association of research libraries (arl), and the association of college and research libraries (acrl) jointly published an opinion supporting ucla in this case. many in the wider library community sympathized with ucla’s library administrators, who cite budget cuts that reduced hours at the school’s media laboratory as one reason they must now offer students a video-streaming option.38 in the end, the case was dismissed, mostly due to the lack of standing aime had to bring the suite against ucla, a state agency, in federal court. while the judge did not https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote30#footnote30 information technology and libraries | march 2012 13 expressly rule on the fair-use argument ucla made, the ruling did confirm that streaming is not a form of video distribution and that the public-performance argument ucla made regarding the videos was not invalidated by the fact that they made copies of the videos in question.39 digitization programs and the google book settlement librarians looking to digitize print collections, either for preservation or to facilitate online access, are also grappling with the copyright monopoly. librarians who do not have the time or resources to seek permission from publishers and authors before scanning a book in their collection cannot touch anything published after 1922. librarylaw.com provides a helpful chart directed at librarians considering digitization projects, but the overwhelming fine print below the chart speaks to the labyrinthine nature of copyright.40 the google book settlement continues to loom large over both the library profession and the publishing industry. at the heart of debate is google’s library project, which is part of google book search, originally named google print.41 the library project allows users to search for books using google’s algorithms to provide at its most basic a “snippet view” of the text from a relevant publication. authors and publishers could also grant their permission to allow a view of select sample pages, and of course if the book is in the public domain, then google can make the entire work visible online.42 in all cases, the user will see a “buy this book” link so that he or she could purchase the publication from online vendors on unrelated sites.43 google hoped to sidestep the copyright permission quandary for a digitization project of this scale, announcing that it would proceed with the digitization of cooperative library collections and that it would be the responsibility of publishers and authors to actively opt out or vocalize their objection to seeing their works digitized and posted online.44 google attempted to turn the copyright permissions process on its head, which was the basis of the class action lawsuit authors guild v. google inc.45 before the settlement was reached, google pointed to kelly v. arriba soft corp as proof that the indexing functions of an internet search engine constitute fair use. in that 2002 case, the ninth circuit court of appeals found that a website’s posting of thumbnail images, or “imprecise copies of low resolution, scaled down images,” constitutes fair use, and google argued its “snippet view” function is equivalent to a thumbnail image.46 however, judge denny chin rejected the google book settlement in march 2011, citing the fact that google would in essence be “exploiting books without the permission of copyright owners” and could also establish a monopoly over the digitized books market. the decision did in the end hinge on the fact that google wanted to follow an opt-out program for copyright holders rather than an affirmative opt-in system.47 the google book settlement was dismissed without prejudice, leaving the door open to further negotiations between the parties concerned. going forward, the library community should be concerned with how google will handle orphan works and how its index of digitized works will be made available to libraries and the public. the 2008 settlement granted google the nonexclusive right to digitize all books published before january 5, 2009, and in exchange, google would have https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote40#footnote40 copyright: regulation out of line with our digital reality | mcdermott 14 “paid 70% of the net revenue earned from uses of google book search in the united states to rights holders.”48 in addition, google would have established the book rights registry to negotiate with google and others seeking to “digitize, index or display” those works on behalf of the rights holders.49 approval of the settlement would have allowed google to move forward with plans to expand google book search and “to sell subscriptions to institutions and electronic versions of books to individuals.”50 the concern that judge denny chin expressed over a potential google book monopoly was widespread among the library community. while the settlement would not have given google exclusive rights to digitize and display these copyrighted works, google planned to ensure via the settlement that it would have received the same terms the book rights registry negotiated with any third-party digital library, while also inoculating itself against the risk of any copyright infringement lawsuits that could be filed against a competitor.51 that would have left libraries vulnerable to any subscription price increases for the google books service.52 libraries should carefully watch the negotiations around any future google books settlement, paying attention to a few key issues.53 there was considerable concern that under the terms of the 2008 settlement, even libraries participating in the google books library project would need to subscribe to the service to have access to digitized copies of the books in their own collections.53 many librarians also vocalized their disappointment in google’s abandonment of its fair-use argument when it agreed to the 2008 settlement, which, if it succeeded, would have been a boon to nonprofit, library-driven digitization programs.54 finally, many librarians were concerned that google’s book rights registry was likely to become the default rights holder for the orphan works in the google books library, and that claims that google books is an altruistic effort to establish a world library conceals the less admirable aim of the project—to monetize out-of-print and orphan works.55 librarians as free culture advocates: implications and recommendations our digital nation has turned copyright law into a minefield for both librarians and the public at large. intellectual property scholar lawrence lessig failed in his attempt to argue before the supreme court that the sonny bono copyright term extension act was an attempt to regulate free speech and therefore violated the first amendment.56 but many believe that our restrictive copyright laws at least violate the intent of the progress clause of the constitution, if not the first amendment: “unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished.”57 while technological advances have placed the digital means of production into the hands of the masses, intellectual property law is leading us down a path to self-censorship.58 as the profession “at the heart of both the knowledge economy and a healthy democracy,”59 it is in our best interest as librarians to recognize the important role we have to play in restoring the balance to copyright law. to engage in the debate over copyright law in the digital age, the library community needs to educate itself and advocate for our own self-interests, focusing on three key areas: https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote50#footnote50 information technology and libraries | march 2012 15 1. copyright law in the classroom and at the conference. we must educate new and seasoned librarians on the nature of copyright law, and the impact it has on library practice and systems. library schools must step up to the plate and include a thorough overview of copyright law in their library science curriculum. while including copyright law in a larger legal-issues class is acceptable, the complexity of current u.s. copyright law demonstrates that this is not a subject that can be glossed over in a single lecture. furthermore, there needs to be a stronger emphasis on continuing education and training on copyright law within the library profession. the sla offers a copyright certificate program, but the reach of such programs is not wide enough. copyright law, and the impacts current policy has on the library profession, must be prominently featured at library conferences. the university of maryland university college’s center for intellectual property offers an online community forum for discussing copyright issues and policies, but it is unclear how many librarians are members.60 2. librarians as standard-bearers for the free culture movement. while the library copyright alliance, to which the ala, arl, and acrl all belong, files amicus briefs in support of balanced copyright law and submits comments to wipo, the wider library community must also advocate for copyright reform, since this is an issue that affects all librarians, everywhere. as a profession, we need to throw our collective weight behind legislative measures that address the copyright monopoly. there have been a number of unfortunate failures in recent years. s. 1621, or the consumers, schools, and libraries digital management awareness act of 2003, attempted to address a number of drm issues, including a requirement that access controlled digital media and electronics include disclosures on the nature of the drm technology in use.61 h.r. 107, the digital media consumers rights act of 2003, would have amended the dmca to allow those researching the technology to circumvent drm software while also eliminating the catch-22 that makes circumventing drm software for fair-use purposes illegal. the balance act of 2003 (h.r. 1066) included provisions to expand fair use to the act of transmitting, accepting, and saving a copyrighted digital work for personal use. all of this legislation died in committee, as did h.r. 5889 (orphan works act of 2008) and s. 2913 (shawn bentley orphan works act of 2008). both bills would have addressed the orphan works dilemma, clearly spelling out the steps one must take to use an orphan work with no express permission from the copyright holder, without fear of a future lawsuit. could a show of support from the library community have saved these bills? it is impossible to know, but it is in our best interest to follow these legislative battles in the future and make sure our voice is heard. 3. libraries and the creative commons phenomenon. in addition, librarians need to take part in the creative commons (cc) movement by actively directing patrons towards this world of digital works that have clear, simple use and attribution requirements. creative commons was founded in 2001 with the support of the center for the study of the public domain at duke university school of law.62 the movement is essentially about free culture, and the idea that many people want to share their creative works and allow others to use or build off of their efforts easily and without seeking their permission. it is not intended to supplant copyright law, and lawrence copyright: regulation out of line with our digital reality | mcdermott 16 lessing, one of the founders of creative commons, has said many times that he believes intellectual property law is necessary and that piracy is inexcusable.63 instead, a cc license states in clear terms exactly what rights the creator reserves, and conversely, what rights are granted to everyone else.64 as lawrence lessig explains, you go to the creative commons website (http://creativecomms.org); you pick the opportunity to select a license: do you want to permit commercial uses or not? do you want to allow modifications or not? if you allow modifications, do you want to require a kind of copyleft idea that other people release the modifications under a similarly free license? that is the core, and that produces a license.65 there are currently six cc licenses, and they include some combination of the four license conditions defined by creative commons: attribution (by), share alike (sa), noncommercial (nc), and no derivatives (nd).66 each of the four conditions is designated by a clever symbol, and the six licenses display these symbols after the creative commons trademark itself, two small c’s inside a circle.67 there are “hundreds of millions of cc licensed works” that can be searched through google and yahoo, and some notable organizations that rely on cc licenses include flickr, the public library of science, wikipedia, and now whitehouse.gov.68 all librarians not already familiar with this approach need to educate themselves on cc licenses and how to find cc licensed works.69 while librarians must still inform their patrons about the realities of copyright law, it is just as important to direct patrons, students, and colleagues to cc licensed materials, so that they can create the mash-ups, videos, and podcasts that are the creative products of our web 2.0 world.70 the creative commons system is not perfect, and “creative commons gives the unskilled an opportunity to fail at many junctures.”71 yet that only speaks to the necessity of educating the library community about the “some rights reserved” movement, so that librarians, who are already called upon to understand traditional copyright law, are also educating our society about how individuals can protect their intellectual property while preserving and strengthening the public domain. conclusion the library community can no longer afford to consider intellectual property law as a foreign topic appropriate for law schools but not library schools. those who are behind the slow extermination of the public domain rely on the complexity of copyright law, and the misunderstanding of the principle of fair use, to make their arguments easier and to brow beat libraries and the public into handing over the rights the constitution bestows on everyone. librarians need to engage in the debate over copyright law to retain control over their collections, and to better serve their patrons. in the past, the library community has not hesitated to stand up for the freedom of speech and self-expression, whether it means taking a stand against banning books from school libraries or fighting to repeal clauses of the usa patriot act. today’s library patrons are not just information consumers—they are also information producers. therefore it is just as critical for librarians to advocate for their creative rights as it is for them to defend their freedom to read. https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote60#footnote60 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecomms.org https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote65#footnote65 information technology and libraries | march 2012 17 the internet has become such a strong incubator of creative expression and innovation that the innovators are looking for a way to shirk the very laws that were designed to protect their interests. in the end, the desire to create and innovate seems to be more innate than those writing our intellectual property laws expected. perhaps financial gain is less of a motivator than the pleasure of sharing a piece of ourselves and our worldview with the rest of society. whether that’s the case or not, what is clear is that if we do not roll back legislation like the sonny bono copyright term extension act and the dmca so as to save the public domain, the pressure to create outside the bounds of the law is going to turn more inventors and artists into anarchists, threatening the interests of reasonable copyright holders. as librarians, we must curate and defend the creative property of the established, while fostering the innovative spirit of the next generation. as information, literature, and other creative works move out of the physical world, and off the shelves, into the digital realm, librarians need to do their part to ensure legislation is aligned with this new reality. if we do not, our profession may suffer first, but it will not be the last casualty of the copyright wars. references 1. beverly goldberg, “lg unlocks doors for creators, consumers with dmca exceptions,” american libraries 41, no. 9 (summer 2010): 14. 2. ibid. 3. goldberg, “lg unlocks doors.” 4. christopher alan jennings, fair use on the internet, prepared by the congressional research service (washington, dc: library of congress, 2002), 2. 5. ibid., 1. 6. ibid. 7. brandon butler, “urban copyright legends,” research library issues 270 (june 2010): 18. 8. robin jeweler, “digital rights” and fair use in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2003), 5. 9. rachel bridgewater, “tipping the scales: how free culture helps restore balance in the age of copyright maximalism,” oregon library association quarterly 16, no. 3 (fall 2010): 19. 10. charles w. bailey jr., “strong copyright + drm + weak net neutrality = digital dystopia?” information technology & libraries 25, no. 3 (summer 2006): 117; u.s. copyright office, “copyright law of the united states,” under “chapter 3: duration of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 11. dan hunter, “culture war,” texas law review 83, no. 4 (2005): 1130. 12. bailey, “strong copyright,” 118. 13. u.s. copyright office, “copyright law of the united states,” under “chapter 1: subject matter and scope of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 14. bailey, “strong copyright,” 118. 15. mary minnow, “library digitization table,” http://www.librarylaw.com/digitizationtable.htm (accessed december 8, 2010). https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.librarylaw.com%2fdigitizationtable.htm copyright: regulation out of line with our digital reality | mcdermott 18 16. brian t. yeh, “orphan works” in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2002), summary. 17. ibid. 18. jisc, in from the cold: an assessment of the scope of “orphan works” and its impact on the delivery of services to the public (cambridge, uk: jisc, 2009), 6. 19. ibid. 20. andrew albanese, “hathitrust suspends its orphan works release,” publishers weekly, sept, 16, 2011, http://www.publishersweekly.com/pw/bytopic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html (accessed october 13, 2011). 21. u.s. copyright office, “copyright law of the united states,” under “chapter 1.” 22. u.s. copyright office, copyright basics (washington, dc: u.s. copyright office, 2000), www.copyright.gov/circs/circl/html (accessed december 8, 2010). 23. mary minnow, california library association, “library copyright liability and pirating patrons,” http://www.cla-net.org/resources/articles/minow_pirating.php (accessed december 10, 2010). 24. bailey, “strong copyright,” 118. 25. overdrive, “copyright,” http://www.overdrive.com/copyright.asp (accessed december 13, 2010). 26. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal (february 25 2011), http://www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed october 13, 2011). 27. butler, “urban copyright legends,” 18. 28. bailey, “strong copyright,” 118. 29. library of congress, fair use on the internet, 3. 30. ibid., summary. 31. matthew k. dames, “education use in the digital age,” information today 27, no. 4 (april 2010): 18. 32. ibid. 33. dames, “education use in the digital age,”18. 34. matthew k. dames, “making a case for copyright officers,” information today 25, no. 7 (july 2010): 16. 35. william c. dougherty, “the copyright quagmire,” journal of academic librarianship 36, no. 4 (july 2010): 351. 36. ibid. 37. library of congress, “digital rights” and fair use in copyright law, 9. 38. dougherty, “the copyright quagmire,” 351. 39. kevin smith, “streaming video case dismissed,” scholarly communications @ duke, october 4, 2011, http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-casedismissed/ (accessed october 13, 2011). http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2fcircs%2fcircl%2fhtml https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.cla-net.org%2fresources%2farticles%2fminow_pirating.php https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.overdrive.com%2fcopyright.asp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ information technology and libraries | march 2012 19 40. dougherty, “the copyright quagmire,” 351. 41. librarylaw.com, “library digitization table.” 42. kate m. manuel, the google library project: is digitization for purposes of online indexing fair use under copyright law, prepared by the congressional research service (washington, dc: library of congress, 2009), 1–2. 43. jeweler, “digital rights” and fair use in copyright law, 2. 44. ibid. 45. ibid. 46. manuel, the google library project, 2. 47. amir efrati and jeffrey a. trachtenberg, “judge rejects google books settlement,” wall street journal, march 23, 2011, http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html (accessed october 13, 2011). 48. jennings, fair use on the internet, 7. 49. manuel, the google library project, 2. 50. ibid., 9–10. 51. ibid. 52. ibid. 53. pamela samuelson, “google books is not a library,” huffington post, october 13, 2009, http://www.huffingtonpost.com/pamela-samuelson/google-books-is-not-a-lib_b_317518.html (accessed december 10, 2009). 54. ivy anderson, “hurtling toward the finish line: should the google book settlement be approved?” against the grain 22, no. 3 (june 2010): 18. 55. samuelson, “google books is not a library.” 56. jeweler, “‘digital rights” and fair use in copyright law, 3. 57. bailey, “strong copyright,” 116. 58. cushla kapitzke, “rethinking copyrights for the library through creative commons licensing,” library trends 58, no. 1 (summer 2009): 106. 59. ibid. 60. university of maryland university college, “member community,” center for intellectual property, http://cipcommunity.org/s/1039/start.aspx (accessed february 21, 2011). 61. robin jeweler, copyright law: digital rights management legislation, prepared by the congressional research service (washington, dc: library of congress, 2004), summary. 62. creative commons, “history,” http://creativecommons.org/about/history/ (accessed december 8, 2010). 63. lawrence lessig, “the vision for the creative commons? what are we and where are we headed? free culture,” in open content licensing: cultivating the creative commons, ed. brian fitzgerald (sydney: sydney university press, 2007), 42. 64. steven j. melamut, “free creativity: understanding the creative commons licenses,” american association of law libraries 14, no. 6 (april 2010): 22. http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.huffingtonpost.com%2fpamela-samuelson%2fgoogle-books-is-not-a-lib_b_317518.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcipcommunity.org%2fs%2f1039%2fstart.aspx https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2fhistory%2f copyright: regulation out of line with our digital reality | mcdermott 20 65. lessig, “the vision for the creative commons?” 45. 66. creative commons, “about,” http://creativecommons.org/about/ (accessed december 8, 2010). 67. ibid. 68. ibid. 69. bridgewater, “tipping the scales,” 21. 70. ibid. 71. woody evans, “commons and creativity,” searcher 17, no. 9 (october 2009): 34. https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2f 62 information technology and libraries | june 2011 jason vaughan and kristen costello management and support of shared integrated library systems the second major hardware migration occurred, and an initial memorandum of understanding (mou) was drafted by the unlv libraries. this mou is still used by the libraries. the mou was discussed with all partners and ultimately signed by the director of each library. since the mou was signed nearly a decade ago, the system has continued to grow by all measures—size of the database, number of users, number of software modules comprising the complete system, and the financial and staff commitment toward support and maintenance. despite the emergence of a large number of other network-based technologies critical to library operations and services, the ils remains a critical system that supports many library operations. the research described in this paper developed in part because there is a dearth of published survey-based research of shared ils management and financial support. this article interweaves local existing practices with research findings. for brevity’s sake, the system shared by the unlv university libraries and four additional partners will be referred to as unlv’s system. to provide a relative sense of the footprint of each partner on the system, various measures can be used (see figure 1). ■■ survey method in april 2010, the authors administered a 20-question survey to the innovative user’s group (iug) via the group’s listserv. the survey focused on libraries that are part of a consortial or otherwise shared innovative ils. the innovative user’s group is the primary user’s group associated with the innovative ils and suite of products. the iug hosts a busy listserv, coordinates the annual north american conference devoted solely to the innovative system, and provides innovative customer-driven enhancement requests. to prevent multiple individuals from the same consortium responding to the survey, instructions indicated that only one individual from the main institution hosting the system should officially respond. given the anonymity of the survey and the desire to provide confidentiality, there is the possibility that some survey responses refer to the same system. the survey consisted primarily of multiple choice, “select all that apply,” and free-text response questions. the survey was divided into four broad topical areas: (1) background information; (2) funding; (3) support; and (4) training, professional development, and planning. the survey was open for a period of three weeks. because respondents could choose to skip questions, the number of responses received per question varied. on average, 43 individual responses were received for each question. innovative currently has more than 1,200 millennium ils installations.2 not all of those installations support multiple, administratively separate library entities. it is unknown the university of nevada, las vegas (unlv) university libraries has hosted and managed a shared integrated library system (ils) since 1989. the system and the number of partner libraries sharing the system has grown significantly over the past two decades. spurred by the level of involvement and support contributed by the host institution, the authors administered a comprehensive survey to current innovative interfaces libraries. research findings are combined with a description of unlv’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems. s ince 1989, the university of nevada, las vegas university libraries has hosted and managed a shared integrated library system (ils). currently, partners include the university of nevada, las vegas university libraries (consisting of one main and three branch libraries, and hereafter referred to as unlv libraries); the administratively separate unlv law library; the college of southern nevada (a community college system consisting of three branch libraries); nevada state college; and the desert research institute. the original ils installation included just the unlv libraries and the clark county community college (now known as the college of southern nevada). the desert research institute joined in the early 1990s, the unlv law library joined with the establishment of the william j. boyd school of law in 1998, and, finally, nevada state college joined upon its creation in 2002. over time, the technological underpinnings of the ils have changed tremendously and have migrated firmly into a webbased environment unknown in 1989. the system was migrated to innovative interfaces’ current java-based platform, millennium, beginning in 1999. since the original installation, there have been three major full hardware migrations, in 1997, 2002, and 2009. over time, regular innovative software updates, as well as additional purchased software modules, have greatly extended both the staff and end user functionality of the ils. in early 2001, unlv and its partners conducted a marketplace assessment of ils vendors catering to academic customers.1 the assessment reaffirmed the consortia’s commitment to innovative interfaces. shortly thereafter, jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. kristen costello (kristen.costello@unlv.edu) is systems librarian, university of nevada las vegas. management and support of shared integrated library systems | vaughan and costello 63 partners originally purchased the system together; 20 (38.5 percent) indicated they purchased the system with some of their current existing partners, while 9 (17.3 percent) indicated they as the main institution originally and solely purchased the system. several of the entities sharing the unlv libraries’ system did not even exist when the ils was originally purchased; only two of the current partners shared the original purchase cost of the system. another background question sought to understand how partners potentially individualize the system despite being on a shared platform. innovative, and likely other similar ils vendors, offers several products to help libraries better manage and control their holdings and acquisitions. of potential benefit to staff operations and workflow, innovative offers the option to have multiple acquisitions and/or serials control units, which provide separate fund files and ranges of order records for different institutions sharing the ils system. of 51 responses received, 44 respondents (86.3 percent) indicated they had multiple acquisitions and serials units and 7 (13.7 percent) do not. innovative offers two web-based discovery interfaces for patrons: the traditional online public access catalog, known as webpac, and their version of a next-generation discovery layer, known as encore. of potential benefit to staff as well as patrons, innovative offers “scoping” modules that help patrons using one of the web-based discovery interfaces, as well as staff using the millennium staff modules. the scoping module allows holdings segmentation by location or material type. scopes allow libraries to define their collections and offer their patrons the option to search just the collection of their applicable library. forty-six (88.5 percent) of the 52 respondents indicated they use scoping and 6 (11.5 percent) do not. unlv how many shared innovative library systems exist. while a true response rate cannot be determined, such a measure is not critical for this research. the survey questions with summarized results are provided in appendix a. ■■ survey background unlv’s system, with only five unique library entities, is a “small” system when compared with survey responses. survey respondents indicated a range from 2 to 80 unique members sharing their system. of the 48 responses received for this background question, 26 (54 percent) indicated 10 or fewer partners on the system. seven (14.6 percent) indicated 40 or more partners. the average number of partners sharing an ils implementation was 18 and the median was 8.5. there can be varying levels of partnership within a shared ils system. unlv’s instance is a rather informal partnership. some survey respondents indicated the existence of a far more structured or dedicated support group not directly associated with any particular library. one respondent noted they have a central office comprised of an executive director and two additional staff, responsible for ils administration; this central office reports to a board of directors, comprised of library directors for each member library. another indicated they have a central office responsible not only for the ils, but for other things such as wide and local area networks and workstation support. one respondent indicated that they are actually a consortium of consortia, with 9 hosts each comprised of anywhere from 4 to 11 libraries. twenty-three respondents out of 52 (44.2 percent) indicated that they and all of their current existing full-time library staff bibliographic records item records order records patron records staff login licenses unlv libraries 105 (70.9%) 1,494,890 (78.2%) 1,906,225 (81.1%) 74,223 (58.4%) 40,788 (59.6%) 85 (69.1%) unlv law library 13 (8.8%) 246,678 (12.9%) 243,788 (10.4%) 29,921 (23.5%) 2,034 (3%) 13 (10.6%) college of southern nevada 27 (18.2%) 146,118 (7.6%) 175,862 (7.5%) 22,142 (17.4%) 23,876 (34.9%) 20 (16.3%) nevada state college 1 (.7%) 17,787 (.9%) 17,979 (.8%) 841 (.7%) 1,718 (2.5%) 3 (2.4%) desert research institute 2 (1.4%) 5,396 (.3%) 5,361 (.2%) 0 (0%) 24 (<.1%) 2 (1.6%) figure 1. various measures of ils footprints for unlv’s shared ils (percentage of overall system) note: “staff login licenses” refers to the number of simultaneous staff users each institution can have on the system at any given time. 64 information technology and libraries | june 2011 share of funding toward annual maintenance based on their number of staff licenses, as shown in figure 1. ■■ funding support from partners mous appear to include funding and budgeting information more than any other discrete topic. direct support costs can include the maintenance support costs paid to one or more vendors, costs for additional vendor authored software modules purchased in addition to the base software, and, perhaps, licensing costs associated with a database or operating system used by the ils (e.g., an oracle license for oracle based ils systems). there are many parameters by which costs could be determined for partners, and, given the dearth of published research on the topic, a chief focus of this research sought more information on what factors were used by other consortia. the authors brainstormed 10 elements that could potentially figure into the overall cost sharing method. thirty-eight respondents provided information on factors playing a role in their cost sharing arrangements, illustrated in figure 2. respondents could mark more than one answer for this question, as more than one factor could be involved. the top two factors relate directly to vendor costs— whether annual support costs or acquisition of new vendor software. hardware placed third in overall frequency; for innovative and likely for other ils systems, ils hardware can be purchased from the vendor or an approved platform can be sourced from a reseller directly. support costs from third parties and the number of staff login ports were each identified as a factor by more than a third of all respondents. ■■ software purchases depending on the software, additional modules extending the system capabilities can benefit a single partner, or, in unlv’s experience, all partners on the system. traditionally, the unlv libraries have had the largest operating budget of the group, and a majority of new software requests have come internally from unlv libraries staff. over the past 20 years, the unlv libraries have fully funded the initial purchase costs of a majority of the software extending the system, regardless of whether it benefits just the unlv libraries or all system partners. there are numerous exceptions where the partner libraries have contributed funding, including significant start-up costs associated with the unlv law library joining the system in 1998 and the addition of nevada state college in 2002. in both instances, those bodies funded required and recommended software directly applicable has multiple serials and acquisitions units as well as multiple scopes configured to help segment the records for each entities’ particular collection. innovative offers various levels of maintenance support. unlv’s level of support includes the vendor supplying services such as application troubleshooting resolution, software updates, and some degree of operating system and hardware configuration and advice. unlv also contracts with the hardware vendor for hardware maintenance and underlying operating system support. the unlv libraries have had the opportunity to hire fully qualified and capable technical staff to provide a high level of support for the ils. unlv’s level of vendor support has evolved from an original full turnkey installation with innovative providing all support to a present level of more modest support. nearly half of all survey respondents, 25 of 52 (48.1 percent) indicated they had a turnkey arrangement with innovative; the remaining 27 respondents had a lesser level of support. maintenance and support obviously carry a cost with one or more third party providers. the majority of the respondents, 40 of 51 (78.4 percent), indicated there is a cost-sharing structure in place where maintenance support costs related to the ils are spread across partner libraries. six respondents (11.8 percent) indicated the main institution fully funds the maintenance support costs. the unlv libraries drafted the first and current mou in 2002 for all five entities sharing the ils system. thirty-five of 51 survey respondents (68.6 percent) indicated they, too, have a mou in place. unlv’s mou is a basic document, two pages in length, split into the following sections: background; acquisition of new or additional hardware; acquisition of new or additional software; annual maintenance associated with the primary vendor and third party suppliers and, importantly, the associated cost allocation method for how annual support costs are split between the partners; how new products are purchased from the vendor; and management and support responsibilities of the hosting institution. many of the survey respondents provided details on items contained in their own mous, which can be clustered into several broad categories. these include budgeting, payments, funding formulas; general governance and voting matters; support (e.g., contractual service responsibilities, responsibilities of member libraries); equipment (e.g., title and use of equipment, who maintains equipment); and miscellaneous. this latter category includes items such as expectations for record quality; network requirements/ restrictions; fine collection; and holds management. the majority of unlv’s mou addresses shared costs for annual maintenance. unlv’s cost-sharing structure is simple. the system has a particular number of associated staff (simultaneous login) licenses, which have gradually increased as the libraries have grown. logins are separated by institution, and each member is assessed their management and support of shared integrated library systems | vaughan and costello 65 annual maintenance bill and all partners help maintain new software acquisitions by contributing toward the annual maintenance. regarding new software acquisitions, cost-sharing practices varied between 44 respondents providing information in the survey. eight (18.2 percent) indicated there is consultation with other partners and there is some arrangement to share costs between the majority or all partners sharing the system. two respondents (4.5 percent) indicated the institution expressing the initial interest in the product fully funds the purchase. nineteen respondents (43.2 percent) indicated that they have had instances of both these scenarios (shared funding and sole funding). two respondents (4.5 percent) indicated they could not recall ever adding any additional software. thirteen respondents (29.5 percent) offered details to their operation such as additional serials and accounting units (for the law library), check-in and order records, and staff licenses. in addition, when the system was migrated from the aging text-based system (innopac) to the current millennium java-based gui system in 1999, the current partners contributed toward the upgrade cost based on number of staff licenses. partner institutions have continued to fund items of sole benefit to their operation, such as adding staff licenses or required network port interfaces associated with patron self-check stations installed at their facilities. during the 2000s, the unlv libraries have fully funded a majority of software of potential benefit to all partners, such as the electronic resource management module, the encore next generation discovery platform, and various opac/encore enhancements. software additions typically increase the figure 2. cost-sharing formula factors t h e a m o u n t o f th e o ve ra ll ye a rl y in n o va ti ve in te rf a c e s m a in te n a n c e /s u p p o rt i n vo ic e t h e a m o u n t o f a n y a d d it io n a l 3 rd p a rt y m a in te n a n c e / su p p o rt a g re e m e n ts a ss o c ia te d w it h t h e i n n o va ti ve sy st e m ( su c h a s c o n tr a c ts w it h t h e h a rd w a re m a n u fa c tu re r— h p, s u n m ic ro sy st e m s [o ra c le ], e tc .) t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d i n n o va ti ve m o d u le s/ p ro d u c ts t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d h a rd w a re a ss o c ia te d w it h t h e i n n o va ti ve s ys te m ( su c h a s a se rv e r, a d d it io n a l d is k s p a c e , b a c k u p e q u ip m e n t, e tc .) t h e n u m b e r o f in c id e n t re p o rt s (o r ti m e s p e n t) , b y p e rs o n n e l a t th e m a in i n st it u ti o n r e la te d t o r e se a rc h , tr o u b le sh o o ti n g , e tc . su p p o rt i ss u e s re p o rt e d b y p a rt n e r in st it u ti o n s t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y in st it u ti o n f t e t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f b ib o r it e m r e c o rd s th e p a rt n e r’ s in st it u ti o n h a s in t h e in n o va ti ve d a ta b a se t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f st a ff lo g in p o rt s d e d ic a te d t o t h e p a rt n e r lib ra ry t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f u se r se a rc h e s c o n d u c te d f ro m i p r a n g e s a ss o c ia te d w it h th e p a rt n e r in st it u ti o n t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y th e n u m b e r o f p a tr o n r e c o rd s w h o se h o m e l ib ra ry i s a ss o c ia te d w it h t h e p a rt n e r in st it u ti o n 66 information technology and libraries | june 2011 applied, the number of staff users has increased significantly, and the system was migrated to an underlying oracle database in 2004. since the original system was purchased in 1989 and fully installed in 1990, the central, locally hosted server has been replaced three times, in 1997, 2002, and 2009. partners contributed toward the costs of the server upgrades in 1997 and 2002, while the unlv libraries fully funded the 2009 upgrade. software and hardware components comprising the backup system have been significantly enhanced with a modern system capable of the speed, capacity, and features needed to perform appropriately in the short backup window available each night. unlv funded the initial backup software and hardware, and the partner institutions contribute toward the annual maintenance associated with the backup equipment and software. one survey question focused on major central infrastructure supporting the ils (defined as items exceeding $1,000 and with several examples listed). the question did not focus on hardware that could be provided by ils vendors benefiting a single partner, such as self-check stations or inventory devices. fourteen (31.8 percent) of the 44 respondents indicated that if major new hardware was needed, there was consultation with other partners, and, if purchased, a cost-sharing agreement was arranged. two respondents (4.5 percent) indicated the institution expressing the initial interest fully funds the purchase and seven respondents indicated they’ve had instances in the past of both these scenarios. three respondents (6.8 percent) indicated their shared system hardware had never been replaced or upgraded to their knowledge. nineteen respondents provided information on alternate scenarios or otherwise more details as to local practice. several indicated a separate fund is maintained solely for large ils system-related improvements or ils related purchases. revenue for these funds can be built up over time through maintenance and use payments by partner libraries or by a small additional fee earmarked for future hardware replacement needs collected each year. one respondent indicated they have been able to get grant funds to cover major purchases. with few exceptions, the majority of free text responses indicated that costs for major purchases were shared by partners or otherwise funded by the central consortium or cooperative agency. as with regular annual maintenance and new software purchases, various elements can determine what portion of hardware replacement costs are borne by partner libraries. this includes number of staff licenses (21.9 percent of responses), institutional fte count (15.6 percent), number of bibliographic or item records (15.6 percent), and number of patron records (9.4 percent). twenty respondents provided additional information. several indicated that the costs are split evenly across all partners. several indicated that population served was a factor. others reiterated that costs for central hardware on other scenarios. several indicated that if a product is directly applicable to only one library, such as self-check interfaces and additional acquisition units, then the library in need fully funds the purchase, which mirrors the local practice at unlv. several respondents indicated that if a product benefits all libraries, then costs are shared equally. one respondent indicated that the partner libraries discuss the potential item, and collectively they may choose not to purchase, even if one or more partners are very interested. in such cases, those partners have the option to purchase the product and must agree to make it available to all partners. several respondents indicated that, as the largest entity using the shared system, they generally always purchased new software for their operation as needed, with the associated benefit that the other partners of the system were allowed to use the software as well. three respondents reiterated that a central office funds add-on modules, in one case from funding set aside each year for system improvements. a fourth respondent indicated that a “joiners fee” fund, built up from new members joining the system, allows for the purchase of new software. clearly there are many scenarios of how new software is funded. generally, regardless of funding source, sole or share, if a product can benefit all partners, it’s allowed to do so. thirty-six survey respondents provided details on what factors determine how much each partner contributes toward new software purchases. seven respondents (19.4 percent) indicated the number of staff licenses plays a role (as in the unlv model). three respondents (8.3 percent) indicated that institution fte played a role, while three other respondents indicated that the number of partner bibliographic/item records played a role. the majority of respondents, 25 (69.4 percent) provided alternate scenarios or otherwise more information. nine of these 25 respondents indicated costs were split evenly across all partners. several indicated that the formula used for determining maintenance costs was also applied to new software purchases. four respondents indicated that the library service population was a factor. two indicated that circulation counts were a factor. one indicated that it’s negotiated on a per purchase basis, based on varying factors. ■■ hardware purchases hardware needs related to the underlying infrastructure, such as server(s), disk space, and backup equipment increases as the ils grows. unlv’s ils installation has grown tremendously. new software modules have been purchased, application architecture changes occurred with the release of the millennium suite in the late 1990s, regular annual updates to the system software have been management and support of shared integrated library systems | vaughan and costello 67 each partner institution. each module coordinator served as the contact person charged with maintaining familiarity with the functions and features of a particular module, testing enhancements within new releases, keeping other staff informed of changes, and alerting the system vendor of any problems with the module. annually, module coordinators were to consider new software and prioritize and recommend ils software the library should consider purchasing. module coordinators were tasked to maintain a system-wide view of the ils and alert others if they discovered problems or made changes to the ils that could affect other areas of the system. in addition, module coordinators were encouraged to subscribe to the iug listserv to monitor discussions and to maintain awareness of overall system issues. all staff had access to the system’s user manual but if they had questions on system features or functions, the module coordinator served as an additional resource. in addition, any bug reports were provided to the most appropriate module coordinator, who would contact innovative. the unlv systems staff, which has grown over time and is now part of the library technologies division, was responsible for all hardware and networking problems, and for scheduling and verifying nightly data backups. the systems department coordinated any new software installations with the module coordinators group, library staff, and library partners. in 2006, the unlv libraries reorganized and hired a dedicated systems librarian focused on the ils. the systems librarian’s principal job responsibility is to serve as the central administrator and site coordinator of the unlv libraries’ shared ils. responsibilities include communicating with colleagues regarding current system capabilities, monitoring vendor software developments, monitoring how other libraries utilize their innovative systems, and recommending enhancements. the systems librarian is the site contact with innovative and coordinates and monitors support calls, software and patch upgrades, and new software module installations. the position serves as the contact person for the shared institutions whenever they have questions or issues with the ils. the systems librarian has taken over much of the work previously coordinated through the module coordinators group. while the formal module coordinators group no longer exists, module experts still provide assistance as needed, and consultation always occurs with partners on system-wide issues as they arise. unlv is not unique in how it manages their ils. in the survey results, 36 respondents (87.8 percent) indicated there is a dedicated individual at the main institution who has a primary responsibility of overseeing the ils. to help clarify the responses, “primary responsibility” is defined as individuals spending more than half their time devoted to support, research, troubleshooting, and system administration duties related to the ils. the authors replacements are determined by the same formula used for assessing the share of annual maintenance. ■■ additional purchases the last funding-related survey question asked if ongoing content enrichment services were subscribed to, and if so, to describe how the cost share amount is determined for partner libraries. content enrichment services can provide additional evaluative content such as book cover images, table of contents (toc), and book reviews. unlv subscribes to a toc service as well as an additional service providing book covers, reviews, and excerpts. partner institutions contribute to the annual service charge associated with the toc service and pay for each record enhanced at their library. unlv fully funds the book cover/review/excerpt service that benefits all partners. fourteen of the 43 survey respondents (32.6 percent) indicated they did not subscribe to enrichment services. twelve respondents (27.9 percent) indicated they had one or more enrichment services and that the costs were fully funded by the main institution. seventeen respondents (39.5 percent) subscribe to enrichment services and that the costs are shared. several indicated the existing cost-sharing formula used for other assessments (annual maintenance, hardware, or nonsubscription-based software) is also used for the ongoing enrichment services. one respondent indicated they maintain a collective fund for enrichment services and estimate the cost of all shared subscriptions; this figure is integrated into the share each institution contributes to the central fund annually. one respondent indicated that their system only uses free enrichment services. ■■ support the next section of the survey addressed staff support efforts related to management of the ils. twenty years ago when unlv installed its ils, staff support included one librarian and one additional staff; both focused on various aspects of system support, from maintaining hardware to working with the vendor, in addition to having other primary job responsibilities completely unrelated to the ils. in addition, over time, functional experts developed for particular modules of the system, such as cataloging, acquisitions, circulation, and serials control. this group of functional experts eventually became known as the unlv innovative module coordinators group, which was chaired by the head of the library systems department. this group met quarterly and included experts from unlv as well as one representative from 68 information technology and libraries | june 2011 solely by the main library. typical system administration activities include managing and executing mid-release and major release software upgrades (95.2 percent of all respondents indicated the main library is solely responsible); managing, coordinating, and scheduling new products for installation (95.2 percent); monitoring disk space (95 percent); and scheduling and monitoring backups (92.9 percent). unlv’s ils support model is very similar to the survey results. the systems librarian at unlv manages all software upgrades, as well as coordinating and scheduling new ils software product and module installs. the library technologies division monitors and schedules the nightly backups and diskspace usage. certain unlv libraries staff and selected individuals from the partner libraries are authorized to open support calls with the system vendor, although the systems librarian often handles this activity herself. other functions, such as maintaining the year-to-date and last year circulation statistics are also performed by the unlv libraries systems librarian. updating circulation parameters are tasks best performed by each of the created a list of 20 duties related to ils system administration and asked respondents to indicate whether: the main library or a central consortial or cooperative office dedicated to the ils handles this particular duty; the duty is shared between the main library and partner libraries; or the duty is handled by just a partner library. as illustrated in figure 3, the survey results overwhelmingly show that the main library in a shared system provides the majority of system administration support. only two tasks were broadly shared between the main library and partner libraries; maintenance of the institution’s records (bibliographic, item, patron, order, etc.) and maintaining network and label printers. other shared tasks included changes to the circulation parameters tables (e.g., configuring loan rules and specifying open hours and days closed tables for materials they themselves circulate) with 40.5 percent of the respondents indicating this as a shared responsibility, opening support calls with the vendor (38.1 percent), monitoring bounced export and fts mail (33.3 percent), and account management (31 percent). the more typical system administration activities are done a c c o u n t m a n a g e m e n t (c re a te n e w / d e le te a c c o u n ts ; m ill e n n iu m a u th o ri za ti o n s) m a n a g e a n d e x e c u te i n n o va ti ve m id -r e le a se a n d m a jo r re le a se s o ft w a re u p g ra d e s m a n a g e , c o o rd in a te a n d s c h e d u le n e w in n o va ti ve s o ft w a re p ro d u c t in st a lla ti o n s s c h e d u le a n d m o n it o r b a c k u p s w ri te s c ri p ts t o a u to m a te p ro c e ss e s (i. e ., c ir c u la ti o n o ve rr id e s re p o rt , sy st e m s ta tu s re p o rt s, e tc .) p e rf o rm r e vi e w f ile m a in te n a n c e a n d t a k e a c ti o n s h o u ld a ll fi le s fi ll o p e n s u p p o rt c a lls w it h i n n o va ti ve m o n it o r st a tu s o f o p e n c a lls ; se rv e a s lia is o n w it h i n n o va ti ve f o r re so lu ti o n o f su p p o rt c a lls m a in ta in y e a rto -d a te /l a st y e a r c ir c u la ti o n st a ti st ic c o u n te rs m o n it o r sy st e m m e ss a g e s m o n it o r d is k s p a c e u sa g e m o n it o r b o u n c e d e x p o rt a n d f t s m a il m a in ta in c o d e t a b le s (f ix e d l e n g th , va ri a b le le n g th , e tc .) u p d a te c ir c u la ti o n p a ra m e te rs t a b le s (lo a n ru le s, h o u rs o p e n , d a ys c lo se d , e tc .) s e t u p , m o n it o r a n d t ro u b le sh o o t n o ti c e s is su e s w ri te o r m o d if y lo a d t a b le s fo r n e w r e c o rd lo a d in g m a in ta in s ys te m p ri n te rs ( la b e l, n e tw o rk e d la se r p ri n te rs ) p ro vi d e m a in te n a n c e o n r e c o rd s (p a tr o n , b ib , it e m , e tc .) m a n a g e s ys te m s e c u ri ty t h ro u g h i n n o va ti ve sy st e m s e tt in g s a n d /o r h o st b a se d o r n e tw o rk b a se d f ir e w a lls p ro vi d e e m e rg e n c y (o ff h o u rs ) re sp o n se t o re p o rt s o f in n o va ti ve d o w n ti m e o r se rv e r h a rd w a re f a ilu re s figure 3. systems administration / support responsibilities management and support of shared integrated library systems | vaughan and costello 69 and definition of policies and procedures. some groups provide recommendations to a larger executive board for the consortia. the meeting frequency of these groups is as varied as the libraries. some groups meet quarterly (33.3 percent) or monthly (20 percent) but the majority meet at other frequencies (40 percent), such as every other month or twice a year. some libraries use e-mail to communicate as opposed to having regular in-person meetings. in addition to a standing committee focused on the ils, and similar to unlv’s experience, libraries may have finite working groups to implement particular products. ■■ training, professional development, and planning the survey also focused on training, professional development, and planning activities related to the ils. there are many methods that library staff can use to stay current with their ils. most training methods typically include in-person workshops or online tutorials, as well as other venues for professional development, such as conference attendance. the authors were interested in how libraries sharing an ils determined training needs and who was responsible for the training. the survey results showed that libraries value a variety of training opportunities, partner institutions, with advice and assistance as necessary provided by the systems librarian. the authors were interested if an ils oversight body exists with other shared systems, and, if so, what issues are discussed. responses indicated that a variety of groups exist, and, in some instances, multiple groups may exist within one consortia (some groups have a more specific ils focus and others a more tangential involvement). as illustrated in figure 4, a minority of respondents, 11 of 41 (26.8 percent), indicated that they do not have a group providing ils oversight. if such a group exists, respondents were allowed to select various predefined duties performed by that group. twenty-three respondents indicated the group discusses purchasing decisions. respondents also indicated that such a group also discusses the impact of the vendor enhancements offered by mid-release and regular full-releases (19), and when to schedule the upgrades (12). the absence of an oversight group doesn’t imply that consultation doesn’t occur, rather, it may be the responsibility of an individual as opposed to an effort coordinated by a group. some libraries also have module-driven committees, which disseminate information, introduce new ideas, and try to promote cohesiveness throughout the consortium. other duties that such an oversight group may focus on include workflow issues, discussion of system issues, figure 4. issues discussed by ils oversight body updates on unresolved problem calls with innovative discussion on enhancements offered by mid-release and regular full release software upgrades and their impact (positive/ negative) on users of the system scheduling mid-release/ full release software upgrades prioritizing and selecting choices related to the innovative user’s group enhancements ballot for your installation discussion of potential new software/ modules to purchase from innovative n/a—an oversight group, body, or committee does not exist related to the oversight of the innovative system other 70 information technology and libraries | june 2011 specifically regarding cost sharing, support, and rights and responsibilities. in conducting this background research, a paucity of published literature was observed, and thus the authors hope the findings above may help other established consortia, who may be interested in reviewing or tweaking their current mous or more formalized agreements likely in place. it may also provide some considerations for libraries considering initiating a shared ils instance, something that, given the current recession, may be a topic to consider. given that nearly a decade has passed since the original unlv mou was drafted and agreed to, several revisions will be proposed and drafted. this includes formalization of how costs are divided for enrichment services (new since the original mou), and formalization in writing of the coordination role of the systems librarian in her capacity as chief manager of the ils. other ideas gathered from survey responses are worth consideration, such as a base additional fee contributed each year (above and beyond the fee accessed as determined by staff licenses). such a fee could help recoup real, sometimes significant costs associated with the system, such as the purchase of additional software benefitting all players (often, in practice funded solely by the main library). such a fee could also help recoup more tangential (but still real) expenses, such as replacement of backup media. however, at the time of writing, tweaking (increasing) the fee assessed to partner institutions is a delicate issue. as with many other institutions of learning and their associated libraries, the nevada system of higher education has been particularly hard hit with funding cuts, even when compared against serious cuts experienced by colleagues nationwide. by all measures (unemployment, state budget shortfall, foreclosures, etc.) nevada has been one of the hardest hit states in the current recession. while knowledge gained from this survey was useful (and current), what effect it will have in changing the cost structure is, now, on hold. in the spirit of support among the libraries in the same system of higher education, and in continuing to demonstrate serious shared efficiencies (by maintaining one joint system as opposed to five individual systems), no new fee structure will be implemented in the short term. at the appropriate time, different costing structures such as those elicited in the survey results will merit closer attention. references 1. jason vaughan, “a library’s integrated online library system: assessment and new hardware implementation,” information technology and libraries 23, no. 2 (june 2004): 50–57. 2. innovative interfaces, “about us: history,” http://www .iii.com/about/history.shtml (accessed may 17, 2010). regardless of the library’s status. the easiest and cheapest method of awareness involves having someone monitor the iug electronic discussion list, with 29 respondents (70.7 percent) indicating that both the main library and one or more partner libraries participate in this activity. attendance at the national and regional iug meetings was also valued highly by libraries with 26 respondents (66.7 percent) indicating both the main libraries and their partner libraries having a staff member attend such meetings in the past 5 years. sixteen respondents (64 percent) indicated both the main library and their partner libraries regularly send staff to the american library association annual conference and midwinter meeting. iug typically has a meeting the friday before the midwinter meeting. attendance at training workshops held at the vendor headquarters, as well as online training, is an activity in which the main library participates more frequently than the partner libraries (61.1 percent). complete survey results are provided in appendix a, available at http://www.lita.org/ala/mgrps/divs/lita/ ital/302011/3002jun/pdf/vaughan_app.pdf. ■■ research summary and future directions integrated library systems shared by multiple partners hold the promise of shared efficiencies. given a rather significant number of responses, shared systems appear to be quite common, ranging from a few partners to systems with many partners. perhaps reflecting this, shared systems range from loose federations of library partners to shared systems managed by a more formalized, official consortium. a majority of libraries with shared systems have a mou or other official documents to help define the nature of the relationship, focusing on such topics as budgeting, payments, and funding formulas; general governance and voting matters; support; and equipment. most libraries sharing a system have a method or funding formula outlining how the ils is funded on an annual basis and the contributions provided by each partner. such methods can include not only annual maintenance, but also the procurement of new hardware and software extending the system capabilities. while many support functions are carried out by a central office or staff at the main library hosting the shared system, partner libraries often participate in annual user group and library association conferences where they help stay abreast of vendor ils developments. the research above describes the authors’ investigations into management of shared integrated library systems. in particular, the authors were interested in how other consortia sharing an ils managed their system, article explainable artificial intelligence (xai) adoption and advocacy michael ridley information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14683 michael ridley (mridley@uoguelph.ca) is librarian, university of guelph. © 2022. abstract the field of explainable artificial intelligence (xai) advances techniques, processes, and strategies that provide explanations for the predictions, recommendations, and decisions of opaque and complex machine learning systems. increasingly academic libraries are providing library users with systems, services, and collections created and delivered by machine learning. academic libraries should adopt xai as a tool set to verify and validate these resources, and advocate for public policy regarding xai that serves libraries, the academy, and the public interest. introduction explainable artificial intelligence (xai) is a subfield of artificial intelligence (ai) that provides explanations for the predictions, recommendations, and decisions of intelligent systems.1 machine learning is rapidly becoming an integral part of academic libraries. xai is a set of techniques, processes, and strategies that libraries should adopt and advocate for to ensure that machine learning appropriately serves librarianship, the academy, and the public interest. knowingly or not, libraries acquire and provide access to systems, services, and collections infused and directed by machine learning methods, and library users are engaged in information behavior (e.g., seeking, using, managing) facilitated or augmented by machine learning. machine learning in library and information science (lis), as with many other fields, has become ubiquitous. however, this technology is often opaque and complex, yet consequential. there are significant concerns about bias, unfairness, and veracity.2 there are troubling questions about user agency and power imbalances.3 while lis has a long-standing interest in ai and intelligent information systems generally, 4 it has only recently turned its attention to xai and how it affects the field and how the field might influence it.5 xai is a critical lens through which to view machine learning in libraries. it is also a set of techniques, processes, and strategies essential to influencing and shaping this stil l emerging technology: research libraries have a unique and important opportunity to shape the development, deployment, and use of intelligent systems in a manner consistent with the values of scholarship and librarianship. the area of explainable artificial intelligence is only one component of this, but in many ways, it may be the most important.6 dismissing engagement with xai because it is “highly technical and impenetrable to those outside that community” is neither acceptable nor increasingly possible.7 artificial intelligence is the essential substrate of contemporary information systems and xai is a tool set for critical assessment and accountability. the details matter and must be understood if libraries are to have a place at the table as xai, and machine learning, evolves and further deepens its effect on lis. mailto:mridley@uoguelph.ca information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 2 this paper provides an overview of xai with key definitions, a historical context, and examples of xai techniques, strategies, and processes that form the basis of the field. it considers areas where xai and academic libraries intersect. the dual emphasis is on xai as a toolset for libraries to adopt and xai as an area for public policy advocacy. what is xai? xai is plagued by definitional problems.8 some definitions are focused solely and narrowly on the technical concepts while others focus only on the broad social and political dimensions. lacking “a theory of explainable ai, with a formal and universally agreed definition of what explanations are,”9 the fundamentals of this field are still being explored, often from different disciplinary perspectives.10 critical algorithm studies position machine learning as socio-techno-informational systems.11 as such, a definition of xai must encompass not just the techniques, as important and necessary as they are, but also the context within which xai operates. the us defense advanced research projects agency (darpa) description of xai captures the breadth and scope of the field. the purpose of xai is for ai systems to have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” 12 and to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.”13 xai is needed to: 1. generate trust, transparency, and understanding; 2. ensure compliance with regulations and legislation; 3. mitigate risk; 4. generate accountable, reliable, and sound models for justification; 5. minimize or mitigate bias, unfairness, and misinterpretation in model performance and interpretation; and 6. validate models and validate explanations generated by xai.14 xai consists of testable and unambiguous proofs, various verification and validation methods that assess influence and veracity, and authorizations that define requirements or mandate auditing within a public policy framework. xai is not a new consideration. explainability has been a preoccupation of computer science since the early days of expert systems in the late twentieth century.15 however, the 2018 introduction of the general data protection regulation (gdpr) by the european union (eu) shifted explainability from a purely technical issue to one with an additional and urgent focus on public policy.16 while the presence of a “right to explanation” in the gdpr is highly contested, 17 industry groups and jurisdictions beyond the eu recognized its evitability spurring an explosion in xai research and development.18 types of xai taxonomies of xai types are classified based on their scope and mechanism.19 local explanations interpret the decisions of a machine learning model used in a specific instance (i.e., involving data and context relevant to the circumstance). global explanations interpret the model more generally (i.e., involving all the training data and relevant contexts). in black-box or model-agnostic explanations, only the input and the output of the machine learning model are required while information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 3 white-box or model-specific explanations require more detailed information regarding the processing or design of the model. another way to categorize xai is as proofs, validations, and authorizations. proofs are testable, traceable, and unambiguous explanations demonstrable through causal links, logic statements, or transparent processes. typically, proofs are only available for ai systems that use “inherently interpretable” techniques such as rules, decisions trees, or linear regressions.20 validations are explanations that confirm the veracity of the ai system. these verifications occur through testing procedures, reproducibility, approximations and abstractions, and justifications. authorizations are explanations because of processes in which third parties provide some form of standard, ratification, prohibition, or audit. authorizations might pertain to the ai model, its operation in specific instances, or even the process by which the ai was created. they can be provided by professional groups, nongovernmental organizations, governments and government agencies, and third parties in the public and private sector. academic libraries can adopt proofs and validations as means to interrogate information systems and resources. this includes collections which are increasingly machine learning systems themselves or developed with machine learning methods. the recognition of “collections as data” is an important shift in this direction.21 where appropriate, proofs and validations should accompany content and systems derived from machine learning. libraries must also engage with xai as authorizations to assess the public policy implications that exist, are emergent, or are necessary. library advocacy is currently lacking in this area. the requirement for policy and governance frameworks is a reminder that machine learning is “far from being purely mechanistic, it is deeply, inescapably human”22 and that while complex and opaque “the ‘black box’ is full of people.”23 prerequisites to an xai strategy three questions are important for any xai strategy: • what constitutes a good explanation? • who is the explanation for? • how will the explanation be provided? explanations are context specific. the “goodness” of an explanation is dependent on the needs and objectives of the explainee (a user) and the explainer (an xai). following research from the fields of psychology and cognitive science, keil suggests five reasons for why someone wants an explanation: (1) to predict similar events in the future, (2) to diagnose, (3) to assess blame or guilt, (4) to justify or rationalize an action, and (5) for aesthetic pleasure.24 for most people, explanations need not be complete or even fully accurate.25 as a result, who the explanation is for is critical to a good explanation. different audiences have different priorities. system developers are primarily interested in performance explanations while clients focus on effectiveness or efficacy, professionals are concerned about veracity, and regulators are interested in policy implications. nonexpert, lay users of a system want explanations that build trust and provide accountability. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 4 a good explanation is also affected by its presentation. there are temporal and format considerations. explanations can be provided or available in real time and continuously as the process occurs (hence partial explanations) or post hoc and in summary form. interactive explanations are widely preferred but are not always appropriate or actionable. 26 studies have compared textual, visual, and multimodal formats with differing results. familiar textual responses or simple visual explanations such as venn diagrams are often most effective for nonexpert users.27 drawing from philosophy, psychology, and cognitive science, miller recommends four approach es for xai.28 explanations are contrastive. when people want to know the “why” of something, “people do not ask why event p happened, but rather why event p happened instead of some event q.” explanations are selected. “humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.” explanations are social. “they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.” finally, miller cautions against using probabilities and statistical relationships and encourages references to causes. burrell identifies three key barriers to explainability: concealment, the limited technical understanding of the user, and an incompatibility between the user (human) and algorithmic reasoning.29 while concealment is deliberate, it may or may not be justified. protecting ip and trade secrets is acceptable while obscuring processes to purposively deceive users is not. regulations are a tool to moderate the former and minimize the latter. the technical limitations of users and the incompatibility between users and algorithms suggest two remedies. first is enhancing algorithmic literacy. algorithmic literacy is a “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.”30 libraries have a key role in advancing algorithmic literacy in their communities.31 just as libraries championed information literacy through the promulgation of standards and principles, the provision of diverse educational programming, and the engagement of the broad academic community, so too can libraries be central to efforts to enhance algorithmic literacy. second is a requirement that xai must be sensitive to the abilities and needs of different users. a survey of the key challenges and research direction of xai identified 39 issues, including the need to understand and enhance the user experience, match xai to user expertise, and explain the competencies of ai systems to users.32 this is the essence of human-centered explainable ai (hcxai). among hcxai principles are the importance of context (regarding user objectives, decision consequences, timing, modality, and intended audience), the value of using hybrid explanation methods that complement and extend each other, and the power of contrastive examples and approaches.33 proofs and validations xai that provide proofs or validations can be adopted by libraries to assess and evaluate machine learning utilized in systems, services, and collections. since proofs pertain to already interpretable systems, the four examples provided focus on validations: feature audit, approximation and abstraction, reproducibility, and xai by ai. these techniques may require access to, or information about, the machine learning model. this would include such characteristics as the algorithms used, settings of the parameters and hyperparameters, optimization choices, and the training data. while all these may not be normally information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 5 available, designers of machine learning systems in consequential settings should expect to provide, indeed be required to provide, such access. similarly, vendors of library content or systems utilizing machine learning should make explanatory proofs and validations available for library inspection. feature audit feature audit is an explanatory strategy that attempts to reveal the key features (e.g., characteristics of the data or settings of the hyperparameters used to the differentiate data) that have a primary role in the prediction of the algorithm. by isolating these features, it is possible to explain the key components of the decision. feature audit is a standard technique of linear regression, but it is made more difficult in machine learning because of the complexity of the information space (e.g., billions of parameters and high dimensionality). there are various feature audit techniques34 but all of them are “decompositional” in that they attempt to reduce the work of the algorithm to its component parts and then use those results as an explanation.35 feature audit can highlight bias or inaccuracy by revealing incongruence between the data and the prediction. more advanced feature audit techniques (e.g., gradient feature auditing) recognize that features can indirectly influence other features and that these features are not easily detectable as separate, influential elements.36 this interaction among features challenges the strict decompositional approach to feature audit and will likely lead to an increased focus on the relational analysis among and between elements. approximation and abstraction approximation and abstraction are techniques that create a more simplified model to explain the more complex model.37 people seek and accept explanations that “satisfice”38 and are coherent with existing beliefs.39 this recognizes that “an explanation has greater power than an alternative if it makes what is being explained less surprising.”40 approaches such as “model distillation”41 or the “model agnostic” feature reduction of the local interpretable model-agnostic explanations (lime) tool create a simplified presentation of the algorithmic model.42 this approximation or abstraction may compromise accuracy, but it provides an accessible representation that enhances understandability. a different type of approximation or abstraction is a narrative of the machine learning processes utilized that provides sufficient documentation for a reader to act as an explanation of the outcomes. an exemplary case of this is lithium-ion batteries: a machine-generated summary of current research published by springer nature and written by beta writer, an ai or more accurately a suite of algorithms.43 a collaboration of machine learning and human editors, the full production cycle of the book is documented in the introduction.44 in lieu of being able to interrogate the system directly, this detailed account provides an explanation of the system allowing readers to assess the strengths, limitations, and confidence levels of the algorithmic processes and offers a model of what might be necessary for future ai generated texts.45 libraries can utilize this documentation in acquisition or licensing decisions and subsequently make it available as user guides when resources are added to the collection. reproducibility replication is a verification strategy fundamental to science. being able to independently reproduce results in different settings provides evidence of veracity and supports user trust. however, documented problems in reproducing machine learning studies have questioned the information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 6 generalizability of these approaches and undermined their explanatory capacity. for example, an analysis of text mining studies using machine learning for citation screening in the preparation of systemic reviews revealed a lack of key elements to enable replicability (e.g., access to research datasets, software environments used, randomization control, and lack of detail on new methods proposed or employed).46 in response, a “reproducibility challenge” was created by the international conference on learning representations (iclr) to validate 2018 conference submissions and has continued in subsequent meetings.47 more rigorous replication through the availability of all necessary components and the development of standards will be important to this type of verification.48 xai by ai the inherent complexity and opacity of unsupervised learning or reinforcement learning suggests, as xai researcher trevor darrell puts it, “the solution to explainable ai is more ai.”49 in this approach to explanation, oversight ai are positioned as intermediaries between an ai and its users: workers have supervisors; businesses have accountants; schoolteachers have principals. we suggest that the time has come to develop ai oversight systems (“ai guardians”) that will seek to ensure that the various smart machines will not stray from the guidelines their programmers have provided.50 while the prospect of ai guardians may be dystopic, oversight systems performing roles that validate, interrogate, and report are common in code checking tools. generative adversarial networks (gans) have been used to create counterfactual explanations of another machine learning model to enhance explainability.51 with strategic organizational and staffing changes to enhance capabilities, libraries can design and deploy such oversight or adversarial tools with objectives appropriate to the requirements and norms of libraries and the academy. authorization xai that results from authorizations is an area where public policy engagement is needed to ensure xai, and machine learning, are appropriately serving libraries, the academy, and the public at large. three examples are provided: codes and standards, regulation, and audit. codes and standards one approach to explanation, supported by the ai industry and professional organizations, are voluntary codes or standards that encourage explanatory capabilities. these nonbinding principles are a type of self-regulation and are widely promoted as a means of assurance.52 the association for computing machinery’s statement on algorithms highlights seven principles as guides to system design and use: awareness, access and redress, accountability, explanation, data provenance, auditability, validation, and testing. however, the language used is tentative and conditional. designers are “encouraged” to provide explanations and to “encourage” a means for interrogation and auditing “where harm is suspected” (i.e., a post hoc process). despite this, the statement concludes with a strong position on accountability if not explainability: “institutions should be held responsible for decisions made by the algorithms that they use, even if it is not feasible to explain in detail how the algorithms produce their results.”53 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 7 unfortunately, the optimism for self-regulation in explainability is undercut by the poor experience with voluntary mechanisms regarding privacy protection.54 in addition, library associations, library system vendors, and scholarly publishers have been slow to endorse any codes or standards regarding explainability. regulation the most common recommendation for ai oversight and authorization to ensure explainability is the creation of a regulatory agency. specific suggestions include a “neutral data arbiter” with investigative powers like the us federal trade commission,55 a food and drug administration “for algorithms,”56 a standing “commission on artificial intelligence,”57 quasi-governmental agencies such as the council of europe,58 and a hybrid agency model combining certification and liability.59 such agencies would have legislated or delegated powers to investigate, certify, license, and arbitrate on matters relating to ai and algorithms, including their design, use, and effects. there are few calls for an international regulatory agency despite digitally porous national boundaries and the global reach of machine learning.60 that almost no such agencies have been created reveals the strength and influence of the large corporations responsible for developing and deploying most machine learning tools and systems.61 reports comparing regulatory approaches to ai among the european union, the united kingdom, the united states, and canada indicate significantly different approaches but with most proceeding with a “light touch” to avoid competitive disadvantages in a multitrillion dollar global marketplace.62 the introduction of the draft eu artificial intelligence act marks the first major jurisdiction to propose specific ai legislation.63 while the act is fulsome about high-risk ai, it is silent on any notion of “explainable” ai, preferring to focus on the less specific idea of “trustworthy artificial intelligence.” with this the eu appears to retreat from the idea of explainability in the gdpr. an exception to this inertia or backtracking is the development and use of algorithmic impact assessments in both governments and industry. these instruments help prospective users of an algorithmic decision-making system determine levels of explanatory requirements and standards to meet those requirements.64 canada has been a leader in this area with a protocol covering use of these systems in the federal government.65 some identify due process as a possible, if limited, remedy for explainability.66 however, a landmark us case suggests otherwise. in state v. loomis, regarding the use of compas, an algorithmic sentencing system, the court ruled on the role of explanation in due process:67 the wisconsin supreme court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant.68 the petition of the loomis case to the us supreme court was denied, so a higher court ruling on this issue is unavailable.69 advocacy for regulations regarding explainability should be a central concern for libraries. without strong regulatory oversight requiring disclosure and accountability, machine learning information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 8 systems will remain black boxes and presence of these consequential systems in the lives of users will be obscured. audit a commonly recommended approach to ai oversight and explanation is third-party auditing.70 the use of audit and principles of auditing are widely accepted in a variety of areas. 71 in a library context, auditing of ai can be thought of as a reviewing process to achieve transparency or to determine product compliance. auditing is typically done after system implementation, but it can be accomplished at any stage. it is possible to audit design specifications, completed code, cognitive models, or periodic audits of specific decisions.72 the keys to successful audit oversight are clear audit goals and objectives (e.g., what is being audited and for what purpose), acknowledged expertise of the auditors, authority of the auditors to recommend, and authorization of the auditors to investigate. any such auditing responsibility for xai would require the trust of stakeholders such as ai designers, government regulators, industry representatives as well as users themselves. critics of the audit approach have focused on lack of auditor expertise, algorithmic complexity, and the need for approaches that assess the algorithmic system prior to its release. 73 while most audit recommendations assume a public agency in this role, an innovative suggestion is a crowdsourced audit (a form of audit study that involves the recruitment of testers to anonymously assess an algorithmic system; an xai form of the “secret shopper”).74 this approach resembles techniques used by consumer advocates and might indicate the rise of public activists into the xai arena. the complexity of algorithms suggests that a precondition for an audit is “auditability.”75 this would require that ai be designed in such a way that an audit is possible (i.e., inspectable in some manner) while, presumably, not impairing its predictive performance. sandvig et al. propose regulatory changes because “rather than regulating for transparency or misbehavior, we find this situation argues for ‘regulation toward auditability’.”76 auditing is not without its difficulties. there are no industry standards for algorithmic auditing.77 a high-profile development was the recent launch of orcaa (orcaarisk.com), an algorithmic auditing company started by cathy o’neil, a data scientist who has written extensively about the perils of uncontrolled algorithms.78 however, the legitimacy of third-party auditing has been criticized as lacking public transparency and the capacity to demand change.79 while libraries may not be able to create their own auditing capacity, whether collectively or individually, they are encouraged to engage with the emerging algorithmic auditing community to shape auditing practices appropriate for scholarly communication. xai as discovery while xai is primarily a means to validate and authorize machine learning systems, another use of xai is gaining attention. since xai can find new information latent in large and complex datasets, discovery is promoted as “one of the most important achievements of the entire algorithmic explainability project.”80 alkhateeb asks “can scientific discovery really be automated” while invoking the earlier work of swanson which mined the medical literature for new knowledge by connecting seemingly unrelated articles through search.81 an emerging reason for libraries to adopt xai may be as a powerful discovery tool. https://orcaarisk.com/ information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 9 conclusion our lives have become “algorithmically mediated”82 where we are “dependent on computational spectacles to see the world.”83 academic libraries are now sites where systems, services, and collections are increasingly shaped and provided by machine learning. the predictions, recommendations, and decisions of machine learning systems are powerful as well as consequential. however, “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”84 taddeo notes that “delegation without supervision characterises the presence of trust.”85 xai is an essential tool to build that trust. geoffrey hinton, a central figure in the development of machine learning,86 argues that requiring an explanation from an ai system would be “a complete disaster” and that trust and acceptance should be based on the system’s performance, not its explainability.87 this is consistent with the view of many that “if algorithms that cannot be easily explained consistently make better decisions in certain areas, then policymakers should not require an explanation.”88 both these views are at odds with the tenants of critical thought and assessment, and both challenge norms of algorithmic accountability. xai is a dual opportunity for libraries. on one hand, it is a set of techniques, processes, and strategies that enable the interrogation of the algorithmically driven resources that libraries provide to their users. on the other hand, it is a public policy arena where advocacy is necessary to promote and uphold the values of librarianship, the academy, and the public interest in the face of powerful new technologies. many disciplines have engaged with xai as machine learning has impacted their fields.89 xai has been called a “disruptive force” in lis,90 warranting the growing interest in how xai affects the field and how the field might influence it. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 10 endnotes 1 vijay arya et al., “one explanation does not fit all: a toolkit and taxonomy of ai explainability techniques,” arxiv:1909.03012 [cs, stat], 2019, http://arxiv.org/abs/1909.03012; shane t. mueller et al., “explanation in human-ai systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai,” arxiv:1902.01876 [cs], 2019, http://arxiv.org/abs/1902.01876; ingrid nunes and dietmar jannach, “a systematic review and taxonomy of explanations in decision support and recommender systems,” user modeling and user-adapted interaction 27, no. 3 (2017): 393–444, https://doi.org/10.1007/s11257-017-9195-0; gesina schwalbe and bettina finzel, “xai method properties: a (meta-) study,” arxiv:2105.07190 [cs], 2021, http://arxiv.org/abs/2105.07190. 2 safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, mass.: harvard university press, 2015); sara wachter-boettcher, technically wrong: sexist apps, biased algorithms, and other threats of toxic tech (new york: w. w. norton, 2017). 3 abeba birhane et al., “the values encoded in machine learning research,” arxiv:2106.15590 [cs], 2021, http://arxiv.org/abs/2106.15590; taina bucher, if ... then: algorithmic power and politics (new york: oxford university press, 2018); sarah myers west, meredith whittaker, and kate crawford, discriminating systems: gender, race, and power in ai (ai now institute, 2019), https://ainowinstitute.org/discriminatingsystems.html. 4 rao aluri and donald e. riggs, “application of expert systems to libraries,” ed. joe a. hewitt, advances in library automation and networking 2 (1988): 1–43; ryan cordell, machine learning + libraries: a report on the state of the field (washington dc: library of congress, 2020), https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf; jason griffey, ed., “artificial intelligence and machine learning in libraries,” library technology reports 55, no. 1 (2019), https://doi.org/10.5860/ltr.55n1; guoying liu, “the application of intelligent agents in libraries: a survey,” program: electronic library and information systems 45, no. 1 (2011): 78–97, https://doi.org/10.1108/00330331111107411; linda c. smith, “artificial intelligence in information retrieval systems,” information processing and management 12, no. 3 (1976): 189–222, https://doi.org/10.1016/0306-4573(76)90005-4. 5 jenny bunn, “working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (xai),” records management journal (london, england) 30, no. 2 (2020): 143–53, https://doi.org/10.1108/rmj-08-2019-0038; cordell, “machine learning + libraries”; andrew m. cox, the impact of ai, machine learning, automation and robotics on the information professions (cilip, 2021), http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report__final_lo.pdf; daniel johnson, machine learning, libraries, and cross-disciplinary research: possibilities and provocations (notre dame, indiana: hesburgh libraries, university of notre dame, 2020), https://dx.doi.org/10.7274/r0-wxg0-pe06; sarah lippincott, mapping the current landscape of research library engagement with emerging technologies in research and learning (washington dc: association of research libraries, 2020), https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies http://arxiv.org/abs/1909.03012 http://arxiv.org/abs/1902.01876 https://doi.org/10.1007/s11257-017-9195-0 http://arxiv.org/abs/2105.07190 http://arxiv.org/abs/2106.15590 https://ainowinstitute.org/discriminatingsystems.html https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.5860/ltr.55n1 https://doi.org/10.1108/00330331111107411 https://doi.org/10.1016/0306-4573(76)90005-4 https://doi.org/10.1108/rmj-08-2019-0038 http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf https://dx.doi.org/10.7274/r0-wxg0-pe06 https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 11 landscape-summary.pdf; thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97; michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28–46, https://doi.org/10.29242/rli.299.3. 6 ridley, “explainable artificial intelligence,” 42. 7 bunn, “working in contexts for which transparency is important,” 151. 8 sebastian palacio et al., “xai handbook: towards a unified framework for explainable ai,” arxiv:2105.06677 [cs], 2021, http://arxiv.org/abs/2105.06677; sahil verma et al., “pitfalls of explainable ml: an industry perspective,” in mlsys journe workshop, 2021, http://arxiv.org/abs/2106.07758; giulia vilone and luca longo, “explainable artificial intelligence: a systematic review,” arxiv:2006.00093 [cs], 2020, http://arxiv.org/abs/2006.00093. 9 wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 17. 10 mueller et al., “explanation in human-ai systems.” 11 isto huvila et al., “information behavior and practices research informing information systems design,” journal of the association for information science and technology, 2021, 1–15, https://doi.org/10.1002/asi.24611. 12 darpa, explainable artificial intelligence (xai) (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 13 matt turek, “explainable artificial intelligence (xai),” darpa, https://www.darpa.mil/program/explainable-artificial-intelligence. 14 julie gerlings, arisa shollo, and ioanna constantiou, “reviewing the need for explainable artificial intelligence (xai),” in proceedings of the hawaii international conference on system sciences, 2020, http://arxiv.org/abs/2012.01007. 15 william j. clancey, “the epistemology of a rule-based expert system—a framework for explanation,” artificial intelligence 20, no. 3 (1983): 215–51, https://doi.org/10.1016/00043702(83)90008-5; william swartout, “xplain: a system for creating and explaining expert consulting programs,” artificial intelligence 21 (1983): 285–325; william swartout, cecile paris, and johanna moore, “design for explainable expert systems,” ieee expert-intelligent systems & their applications 6, no. 3 (1991): 58–64, https://doi.org/10.1109/64.87686. 16 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679. https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.29242/rli.299.3 http://arxiv.org/abs/2105.06677 http://arxiv.org/abs/2006.00093 https://doi.org/10.1002/asi.24611 http://www.darpa.mil/attachments/darpa-baa-16-53.pdf https://www.darpa.mil/program/explainable-artificial-intelligence http://arxiv.org/abs/2012.01007 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1109/64.87686 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 12 17 lilian edwards and michael veale, “slave to the algorithm? why a ‘right to explanation’ is probably not the remedy you are looking for,” duke law & technology review 16 (2017): 18–84; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation’,” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741; margot e. kaminski, “the right to explanation, explained,” berkeley technology law journal 34, no. 1 (2019): 189–218, https://doi.org/10.15779/z38td9n83h; sandra wachter, brent mittelstadt, and luciano floridi, “why a right to explanation of automated decision-making does not exist in the general data protection regulation,” international data privacy law 7, no. 2 (2017): 76–99, https://doi.org/10.1093/idpl/ipx005. 18 amina adadi and mohammed berrada, “peeking inside the black-box: a survey on explainable artificial intelligence (xai),” ieee access 6 (2018): 52138–60, https://doi.org/10.1109/access.2018.2870052; mueller et al., “explanation in human-ai systems”; vilone and longo, “explainable artificial intelligence.” 19 schwalbe and finzel, “xai method properties.” 20 or biran and courtenay cotton, “explanation and justification in machine learning: a survey” (international joint conference on artificial intelligence, workshop on explainable artificial intelligence (xai), melbourne, 2017), http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. 21 padilla, responsible operations. 22 jenna burrell and marion fourcade, “the society of algorithms,” annual review of sociology 47, no. 1 (2021): 231, https://doi.org/10.1146/annurev-soc-090820-020800. 23 nick seaver, “seeing like an infrastructure: avidity and difference in algorithmic recommendation,” cultural studies 35, no. 4–5 (2021): 775, https://doi.org/10.1080/09502386.2021.1895248. 24 frank c. keil, “explanation and understanding,” annual review of psychology 57 (2006): 227– 54, https://doi.org/10.1146/annurev.psych.57.102904.190100. 25 donald a. norman, “some observations on mental models,” in mental models, ed. dedre gentner and albert l. stevens (new york: psychology press, 1983), 7–14. 26 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; joachim diederich, “methods for the explanation of machine learning processes and results for non-experts,” psyarxiv, 2018, https://doi.org/10.31234/osf.io/54eub. 27 pigi kouki et al., “user preferences for hybrid explanations,” in proceedings of the eleventh acm conference on recommender systems, recsys ’17 (new york, ny: acm, 2017), 84–88, https://doi.org/10.1145/3109859.3109915. https://doi.org/10.1609/aimag.v38i3.2741 https://doi.org/10.15779/z38td9n83h https://doi.org/10.1093/idpl/ipx005 https://doi.org/10.1109/access.2018.2870052 http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf https://doi.org/10.1146/annurev-soc-090820-020800 https://doi.org/10.1080/09502386.2021.1895248 https://doi.org/10.1146/annurev.psych.57.102904.190100 https://doi.org/10.1145/3173574.3174156 https://doi.org/10.31234/osf.io/54eub https://doi.org/10.1145/3109859.3109915 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 13 28 tim miller, “explanation in artificial intelligence: insights from the social sciences,” artificial intelligence 267 (2019): 3, https://doi.org/10.1016/j.artint.2018.07.007. 29 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512. 30 duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 2, https://doi.org/10.1145/3313831.3376727. 31 michael ridley and danica pawlick-potts, “algorithmic literacy and the role for libraries,” information technology and libraries 40, no. 2 (2021), https://doi.org/doi.org/10.6017/ital.v40i2.12963. 32 waddah saeed and christian omlin, “explainable ai (xai): a systematic meta-survey of current challenges and future opportunities,” arxiv:2111.06420 [cs], 2021, http://arxiv.org/abs/2111.06420. 33 shane t. mueller et al., “principles of explanation in human-ai systems” (explainable agency in artificial intelligence workshop, aaai 2021), http://arxiv.org/abs/2102.04972. 34 sebastian bach et al., “on pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation,” plos one 10, no. 7 (2015): e0130140, https://doi.org/10.1371/journal.pone.0130140; biran and cotton, “explanation and justification in machine learning: a survey”; chris brinton, “a framework for explanation of machine learning decisions” (ijcai-17 workshop on explainable ai (xai), melbourne: ijcai, 2017), http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf; chris olah, alexander mordvintsev, and ludwig schubert, “feature visualization,” distill, november 7, 2017, https://doi.org/10.23915/distill.00007. 35 edwards and veale, “slave to the algorithm?” 36 philip adler et al., “auditing black-box models for indirect influence,” knowledge and information systems 54 (2018): 95–122, https://doi.org/10.1007/s10115-017-1116-3. 37 alisa bokulich, “how scientific models can explain,” synthese 180, no. 1 (2011): 33–45, https://doi.org/10.1007/s11229-009-9565-1; keil, “explanation and understanding.” 38 herbert a. simon, “what is an ‘explanation’ of behavior?,” psychological science 3, no. 3 (1992): 150–61, https://doi.org/10.1111/j.1467-9280.1992.tb00017.x. 39 norbert schwarz et al., “ease of retrieval as information: another look at the availability heuristic,” journal of personality and social psychology 61, no. 2 (1991): 195–202, https://doi.org/10.1037/0022-3514.61.2.195; paul thagard, “evaluating explanations in law, science, and everyday life,” current directions in psychological science 15, no. 3 (2006): 141– 45, https://doi.org/10.1111/j.0963-7214.2006.00424.x. https://doi.org/10.1016/j.artint.2018.07.007 https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/3313831.3376727 https://doi.org/doi.org/10.6017/ital.v40i2.12963 http://arxiv.org/abs/2111.06420 http://arxiv.org/abs/2102.04972 https://doi.org/10.1371/journal.pone.0130140 http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf https://doi.org/10.23915/distill.00007 https://doi.org/10.1007/s10115-017-1116-3 https://doi.org/10.1007/s11229-009-9565-1 https://doi.org/10.1111/j.1467-9280.1992.tb00017.x https://doi.org/10.1037/0022-3514.61.2.195 https://doi.org/10.1111/j.0963-7214.2006.00424.x information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 14 40 tania lombrozo, “explanatory preferences shape learning and inference,” trends in cognitive sciences 20, no. 10 (2016): 756, https://doi.org/10.1016/j.tics.2016.08.001. 41 sarah tan et al., “detecting bias in black-box models using transparent model distillation,” arxiv:1710.06169 [cs, stat], november 18, 2017, http://arxiv.org/abs/1710.06169. 42 marco tulio ribeiro, sameer singh, and carlos guestrin, “model-agnostic interpretability of machine learning,” arxiv:1606.05386 [cs, stat], 2016, http://arxiv.org/abs/1606.05386. 43 beta writer, lithium-ion batteries: a machine-generated summary of current research (heidelberg: springer nature, 2019), https://link.springer.com/book/10.1007/978-3-03016800-1. 44 henning schoenenberger, christian chiarcos, and niko schenk, preface to lithium-ion batteries; a machine-generated summary of current research, by beta writer, (heidelberg: springer international publishing, 2019). 45 michael ridley, “machine information behaviour,” in the rise of ai: implications and applications of artificial intelligence in academic libraries, ed. sandy hervieux and amanda wheatley (association of college and university libraries, 2022). 46 babatunde kazeem olorisade, pearl brereton, and peter andras, “reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist,” journal of biomedical informatics 73 (2017): 1–13, https://doi.org/10.1016/j.jbi.2017.07.010; babatunde k. olorisade, pearl brereton, and peter andras, “reproducibility in machine learning-based studies: an example of text mining,” in reproducibility in ml workshop (international conference on machine learning, sydney, australia, 2017), https://openreview.net/pdf?id=by4l2pbq-. 47 joelle pineau, “reproducibility challenge,” october 6, 2017, http://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html. 48 benjamin haibe-kains et al., “transparency and reproducibility in artificial intelligence,” nature 586, no. 7829 (2020): e14–e16, https://doi.org/10.1038/s41586-020-2766-y; benjamin j. heil et al., “reproducibility standards for machine learning in the life sciences,” nature methods, august 30, 2021, https://doi.org/10.1038/s41592-021-01256-7. 49 cliff kuang, “can a.i. be taught to explain itself?,” the new york times magazine, november 21, 2017, 50, https://nyti.ms/2hr1s15. 50 amitai etzioni and oren etzioni, “incorporating ethics into artificial intelligence,” the journal of ethics 21, no. 4 (2017): 403–18, https://doi.org/10.1007/s10892-017-9252-2. 51 kamran alipour et al., “improving users’ mental model with attention-directed counterfactual edits,” applied ai letters, 2021, e47, https://doi.org/10.1002/ail2.47. 52 association for computing machinery, statement on algorithmic transparency and accountability (new york: acm, 2017), http://www.acm.org/binaries/content/assets/publicpolicy/2017_joint_statement_algorithms.pdf; alex campolo et al., ai now 2017 report (new https://doi.org/10.1016/j.tics.2016.08.001 http://arxiv.org/abs/1710.06169 http://arxiv.org/abs/1606.05386 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://doi.org/10.1016/j.jbi.2017.07.010 https://openreview.net/pdf?id=by4l2pbqhttp://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html https://doi.org/10.1038/s41586-020-2766-y https://doi.org/10.1038/s41592-021-01256-7 https://nyti.ms/2hr1s15 https://doi.org/10.1007/s10892-017-9252-2 https://doi.org/10.1002/ail2.47 http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 15 york: ai now institute, 2017); ieee, ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (new york: ieee, 2019), https://standards.ieee.org/content/dam/ieeestandards/standards/web/documents/other/ead1e.pdf. 53 association for computing machinery, statement on algorithmic transparency and accountability, 2. 54 lilian edwards and michael veale, “enslaving the algorithm: from a ‘right to an explanation’ to a ‘right to better decisions’?,” ieee security & privacy 16, no. 3 (2018): 46–54. 55 kate crawford and jason schultz, “big data and due process: toward a framework to redress predictive privacy harms,” boston college law review 55, no. 1 (2014): 93–128. 56 andrew tutt, “an fda for algorithms,” administrative law review 69, no. 1 (2017): 83–123. 57 corinne cath et al., “artificial intelligence and the ‘good society’: the us, eu, and uk approach,” science and engineering ethics, march 28, 2017, https://doi.org/10.1007/s11948-017-9901-7. 58 edwards and veale, “slave to the algorithm?” 59 matthew u. scherer, “regulating artificial intelligence systems: risks, challenges, competencies, and strategies,” harvard journal of law & technology 29, no. 2 (2016): 353– 400. 60 roger brownsword, “from erewhon to alphago: for the sake of human dignity, should we destroy the machines?,” law, innovation and technology 9, no. 1 (january 2, 2017): 117–53, https://doi.org/10.1080/17579961.2017.1303927. 61 birhane et al., “the values encoded in machine learning research”; ana brandusescu, artificial intelligence policy and funding in canada: public investments, private interests (montreal: centre for interdisciplinary research on montreal, mcgill university, 2021). 62 cath et al., “artificial intelligence and the ‘good society’”; law commission of ontario and céline castets-renard, comparing european and canadian ai regulation, 2021, https://www.lcocdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulationfinal-november-2021.pdf. 63 european commission, “artificial intelligence act,” 2021, https://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:52021pc0206. 64 dillon reisman et al., algorithmic impact assessment: a practical framework for public agency accountability (new york: ai now institute, 2018), https://ainowinstitute.org/aiareport2018.pdf. 65 treasury board of canada secretariat, “directive on automated decision-making,” 2019, http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://doi.org/10.1007/s11948-017-9901-7 https://doi.org/10.1080/17579961.2017.1303927 https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://ainowinstitute.org/aiareport2018.pdf http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 16 66 danielle keats citron and frank pasquale, “the scored society: due process for automated predictions,” washington law review 89 (2014): 1–33; scherer, “regulating artificial intelligence systems.” 67 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. 68 “state v. loomis,” harvard law review 130, no. 5 (2017), https://harvardlawreview.org/2017/03/state-v-loomis/. 69 “loomis v. wisconsin,” scotusblog, june 26, 2017, http://www.scotusblog.com/casefiles/cases/loomis-v-wisconsin/. 70 brownsword, “from erewhon to alphago”; campolo et al., ai now 2017 report; ieee, ethically aligned design; pasquale, the black box society: the secret algorithms that control money and information; wachter, mittelstadt, and floridi, “why a right to explanation.” 71 michael power, the audit society: rituals of verification (oxford: oxford university press, 1997). 72 alfred ng, “can auditing eliminate bias from algorithms?,” the markup, february 23, 2021, https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-fromalgorithms. 73 joshua alexander knoll, “accountable algorithms” (phd diss, princeton university, 2015). 74 christian sandvig et al., “auditing algorithms: research methods for detecting discrimination on internet platforms,” data and discrimination: converting critical concerns into productive inquiry, 2014, http://wwwpersonal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20-%20ica%202014%20data%20and%20discrimination%20preconference.pdf. 75 association for computing machinery, statement on algorithmic transparency and accountability. 76 sandvig et al., “auditing algorithms,” 17. 77 ng, “can auditing eliminate bias from algorithms?” 78 cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016). 79 emanuel moss et al., assembling accountability: algorithmic impact assessment for the public interest (data & society, 2021), https://datasociety.net/wpcontent/uploads/2021/06/assembling-accountability.pdf. 80 david s. watson and luciano floridi, “the explanation game: a formal framework for interpretable machine learning,” synthese (dordrecht) 198, no. 10 (2020): 9214, https://doi.org/10.1007/s11229-020-02629-9. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing https://harvardlawreview.org/2017/03/state-v-loomis/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://doi.org/10.1007/s11229-020-02629-9 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 17 81 ahmed alkhateeb, “science has outgrown the human mind and its limited capacities,” aeon, april 24, 2017, https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limitedcapacities; don r. swanson, “undiscovered public knowledge,” the library quarterly 56, no. 2 (1986): 103–18; don r. swanson, “medical literature as a potential source of new knowledge.,” bulletin of the medical library association 78, no. 1 (1990): 29–37. 82 jack anderson, “understanding and interpreting algorithms: toward a hermeneutics of algorithms,” media, culture & society 42, no. 7–8 (2020): 1479–94, https://doi.org/10.1177/0163443720919373. 83 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 24. 84 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computermediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 59. 85 mariarosaria taddeo, “trusting digital technologies correctly,” minds and machines 27, no. 4 (2017): 565, https://doi.org/10.1007/s11023-017-9450-5. 86 cade metz, genius makers: the mavericks who brought ai to google, facebook, and the world (dutton, 2021). 87 tom simonite, “google’s ai guru wants computers to think more like brains,” wired, december 12, 2018, https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/. 88 nick wallace, “eu’s right to explanation: a harmful restriction on artificial intelligence,” techzone, january 25, 2017, http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-rightexplanation-harmful-restriction-artificial-intelligence.htm#. 89 mueller et al., “explanation in human-ai systems.” 90 bunn, “working in contexts for which transparency is important,” 143. https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://doi.org/10.1177/0163443720919373 https://doi.org/10.1007/s11023-017-9450-5 https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/ http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm abstract introduction what is xai? types of xai prerequisites to an xai strategy proofs and validations feature audit approximation and abstraction reproducibility xai by ai authorization codes and standards regulation audit xai as discovery conclusion endnotes public libraries leading the way reorienting collection analysis cost-effective item-level analysis and machine learning in public libraries ross hanney information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16987 about the author ross hanney (corresponding author: r.hanney@sjcpl.org) is staff development coordinator, st. joe county (indiana) public library. © 2023. abstract in public libraries, especially those in rural settings, it is important that every dime of library funding is leveraged effectively into serving the community. as part of a year-long project beginning in january 2023, we are evaluating item-level cost-effectiveness for each circulating item housed at the public library in lakeville, indiana. through the use of big(ish) data, some custom python scripting, and machine learning algorithms we hope to answer: how much money is saved by library patrons through their use of the public library's physical collection? how much money is saved by the community through the operation of a public library based on the use of the circulating collection? and are there any non-obvious traits which make an item or title a more or less cost-effective circulating asset? in this column, i will describe the scripts, share initial findings, discuss challenges, and investigate next steps. introduction there is so much to be said about the value public libraries bring to the communities they serve. this column is not going to investigate the worth of programs or experiences, the social interactions taking place, the knowledge or education gained by patrons, or the many other benefits that endure because of the existence of public libraries. instead, this column will discuss the dollars and cents value that public libraries bring to their community using circulating collections. by using data available through the library's integrated library system (ils), i examine how cost-effective each item is throughout any given month, quarter, or year, and what, if anything, makes an item more or less cost effective. about the community this project focused on a public library branch located in lakeville, indiana, one of the ten locations operated by the st. joe county public library system. lakeville is a rural farming town with a population of 878 people. what prompted the work? during the past few years, the st. joe county public library had received some concern from the community regarding the amount of money being used toward library operations. these concerns were spurred by the renovation of the largest location in downtown south bend, which took two years and cost an estimated $38 million to complete. with the added financial pressure on households brought on by the covid-19 pandemic and the marked increase in inflation, some patrons were beginning to voice concerns about the financial benefit of public libraries. mailto:r.hanney@sjcpl.org information technology and libraries december 2023 public libraries leading the way: reorienting collection analysis 2 hanney part i: the money saved with the goal of determining the financial benefits of the library's collection, i took the first step towards devising an effective method to best measure the value of any given item in the library's collection at the branch. the cost of the item and the number of times it was checked out was reviewed. while not a perfect system, this method provides the total amount of money saved by library patrons who check out the material rather than purchase it for themselves. the formula was the following: money_saved = times_circulated * item_cost table 1 shows how the amount of money saved by patrons, without accounting for operating expenses, was calculated. the init chkout and secd chkout fields relate to the first and second time the data was harvested from the library’s database. the difference of these two variables yields the number of times the item was checked out within a certain time period. in this instance, it was a month. each of the items above are from a different collection, a non-fiction book, holiday dvd, picture book, and a beginner reader. table 1. barcode init chkout secd chkout tot chkout price saved 3 1986 03488 2657 50 51 1 $19.95 $19.95 3 1986 03656 8510 189 190 1 $30.00 $30.00 3 1986 03703 8992 23 23 0 $16.00 $0.00 3 1986 05242 4986 44 46 2 $16.99 $33.98 the second step was to design a way to discover how much money the circulating collection saved the community. uncovering this information was much more difficult considering the costs associated with the operation of the library and that not all operating costs are directly focused on the collection. for the purposes of this project, the cost of one full-time circulation assistant per 10,000 items, building and utility costs, subscription service costs necessary for collection management, and the cost of new materials were used to calculate operating expenses. the following formula was used: money_saved = (times_circulated * item_cost) – (operating_expenses / number_of_items) while this method is not a perfect way of looking at the value of each item based on the cost to the community, it provides a framework for consistently measuring this metric, which met the needs for this evaluation. part ii: methodology and data analysis to figure out answers to the two questions (how much money is saved by library patrons through their use of the public library's physical collection? how much money is saved by the community through the operation of a public library based on the use of the circulating collection?), i harvested and analyzed select data from the library's ils to gain insight into what makes each item more or less cost effective. at the beginning of each month, i exported the following itemlevel data from the ils: information technology and libraries december 2023 public libraries leading the way: reorienting collection analysis 3 hanney • item barcode or specific identifier • total number of checkouts throughout the life of the item • initial cost of the item at time of purchase • item’s collection area (children's picture books, adult mystery, etc.) • call number of the item • publication statement (marc 260: publisher, publication date, region of publication) • subject added entries (marc 650: common search terms linking to a controlled vocabulary) • index terms for genre/form (marc 655: words or phrases used to aid in the discovery of the item on the library's online public access catalog) table 2 shows example data harvested from the ils. table 2. sample data harvested from the library’s ils. barcode total check outs price loc call # marc 260 marc 650 marc 655 3 1986 01685 5804 50 $15.95 lkvch j 394.26 8 g352s, easy new york : holiday house, c1994. saint patrick's day juvenile literature. null 3 1986 03069 0336 56 $20.00 lkvav dvd music al whi hollywood, calif. : paramount, c2000. entertainers united states drama.; world war, 1939-1945 veterans united states drama.; manwoman relationships united states drama. christmas plays. video recordings for the hearing impaired. musical films. christmas films. feature films. 3 1986 06285 2234 6 $31.00 lkvlp lg print fiction gra null amish fiction.; interfaith dating fiction.; teenage pregnancy fiction.; large type books. religious fiction. love stories. christian fiction. each of the datasets exported consisted of, on average, 12,000 rows, one row for each item in the lakeville branch library’s collection. information technology and libraries december 2023 public libraries leading the way: reorienting collection analysis 4 hanney part iii: the scripts working with such large datasets by hand would have been nearly impossible due to the time it would have taken and would have likely resulted in the presence of human errors. for these reasons, i wrote python scripts to handle the data within these datasets. these scripts generally fall into two categories: data pre-processing and data analysis. scripts are available in github at https://www.github.com/sardinedude1/ceila. the scripts contained within the data pre-processing category prepare the data to be analyzed and include format manipulation, concatenation, and the generation of training data. the scripts which fall under the data analysis category generate useful information in understanding the pre-processed data. one of these scripts consists of a multi-input neural network model that combines text and numeric inputs, concatenation layers, and a dense output layer for classification. in other words, it is a predictive model which uses harvested item-level data and predicts whether the item would be cost effective or not. the neural network uses the item's collection, the publication statement, the number of subject added entries, and the cost of the item to predict the value rating of the item on a scale of 1 to 3, with 1 being not cost effective and 3 being cost effective. part iv: initial findings money saved on average, patrons of the public library in lakeville, indiana collectively saved approximately $18,500 each month by checking items out rather than purchasing them. after accounting for operating expenses, the community saved on average an estimated $8,500 each month. in st. joseph county, the library is funded mainly by property taxes. the average lakeville household pays $3,471.40 per year in property taxes. however, only a small portion of those tax dollars ever reach the library. these funds amount to an estimated $16 per month per resident. after library operating expenses are considered, over half of the taxpayer’s obligation is recouped through their use of the circulating collection. machine learning once complete, the neural network was able to provide accurate predictions about an item's costeffectiveness, with an average accuracy rating of 92% . this rating was determined by separating the available data into training data (90% of the dataset) and testing data (10% of the dataset). once the model was trained using the training data, it was asked to make predictions about the testing data it was not trained on. for example, if a dataset was 12,000 items, the neural network would be trained on 10,800 items. when asked to predict whether or not an item was costeffective on the remaining 1,200, it got 1,104 predictions correct. after an investigation into the internal architecture of the trained model, i discovered that the most important feature in predicting if an item is more or less cost effective is the publication statement. however, the number of subject fields was also highly weighted and was only slightly less important when forming predictions. the third most important feature was the genre or collection to which the item belongs. which was still weighted more than twice as important as the cost of the item. https://www.github.com/sardinedude1/ceila information technology and libraries december 2023 public libraries leading the way: reorienting collection analysis 5 hanney these preliminary findings are based on initial results of the predictive model. more work is required to determine if this is a viable means of predicting an item’s cost effectiveness using only these four variables. part v: challenges and limitations several challenges were encountered throughout the course of this project, including the availability of data, the amount of time needed to pre-process data and train an artificial neural network on non-specialized equipment, and my own knowledge of artificial intelligence and neural networks. the infrastructure for data analysis at the item-level was not something our library had given a lot of thought to before this project. there were no formal guidelines or staff training on how to analyze this kind of circulation data or even where to find this data. however, beginning with the ils user manual and ending with a series of long conversations with the database librarian, a temporary framework was established, and i was able to get the data needed. once the data was acquired, it needed to be processed. it took approximately 20 minutes of processing time to prepare a single dataset of about 12,000 lines. this does not include the time it takes to check for errors in preprocessing scripts, the training of the neural network, or backing up the data. this process could have been sped up with higher quality or specialized equipment, but there was none available, so i made do with the resources i had on hand. the last barrier encountered during this project was the limited knowledge of neural networks and artificial intelligence model architecture. it turns out, being a public librarian does not prime someone for an easy foray into the world of ai development or even computer programming. however, with much time and energy, a functional neural network was born after hours and hours of testing and troubleshooting. part vi: conclusion and next steps i was pleasantly surprised to discover the initial amount of money saved by patrons who check items out at the library rather than purchasing the items for themselves. this metric could be used to aid in demonstrating the dollar and cents value library users get access to through their local library. the findings surrounding the money saved by the community, albeit an estimated amount, was also encouraging. with this metric in mind, perhaps steps could be taken to increase the monetary value the library presents to the community through its circulating collection. while public libraries are so much more than the materials they make available to their patrons, continuing to provide materials that aid in the economic support of the community should be of some consideration. finally, after analyzing the results from the neural network, some non-obvious traits were discovered to have an impact on the cost-effectiveness of items. the publication statement being the highest weighted feature in forming a prediction was quite striking. this suggests that the marc 260 field has the highest influence on an item’s circulation, and by extension its costeffectiveness, out of the four inputs investigated. it was equally remarkable that the cost of the item had the least amount of influence in making a prediction. this seems to indicate that the cost of a circulating object holds little influence on how well the item will circulate. information technology and libraries december 2023 public libraries leading the way: reorienting collection analysis 6 hanney while a more comprehensive look at the relationships between item-level data and its effects on circulation is needed before implementing this neural network as a collection development tool, the information gained from looking at how much money patrons and the community save by using the library is certainly usable now. in fact, it just might make it into the next library newsletter. letter from the editor: improving ital's peer review letter from the editor improving ital’s peer review kenneth j. varnum information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13573 over the past several months, ital has enrolled almost 30 reviewers to the journal’s new review panel. increasing the pool of reviewers for the journal supports the editorial board’s desire to provide equitable treatment to submitted articles by having two independent reviews provide double-blind consideration of each article, a practice that has now been in effect for articles submitted after may 1, 2021. i am grateful to the individuals (listed on the editorial team page) who volunteered, attended an orientation session, and have begun contributing to the work of the journal. * * * * * * in this issue in the editorial section of this issue, we have a column by incoming core president margaret heller. her essay, “making room for change through rest,” highlights the need for each of us to recharge after a collectively challenging year. this inaugurates what we plan to be an occasional feature, the “core leadership column,” to which we invite contributions from members of core leadership. it is joined by two other regular items, our editorial board thoughts essay by michael p. sauers, “do space’s virtual interview lab: using simple technology to serve the public in a time of crisis” and william yarbrough’s public libraries leading the way column, “service barometers: using lending kiosks to locate patrons.” an interesting and diverse set of peer-reviewed articles round out the issue: 1. the impact of covid-19 on the use of academic library resources / ruth sara connell, lisa c. wallis, and david comeaux 2. emergency remote library instruction and tech tools: a matter of equity during a pandemic / kathia ibacache, amanda rybin, and eric vance 3. off-campus access to licensed online resources through shibboleth / francis jayakanth, anand t. byrappa, and raja visvanathan 4. a framework for measuring relevancy in discovery environments / blake l. galbreath, alex merrill, and corey m. johnson 5. beyond viaf: wikidata as a complementary tool for authority control in libraries / carlo bianchini, stefano bargioni, and camillo carlo pellizzari di san girolamo 6. algorithmic literacy and the role for libraries / michael ridley and danica pawlick-potts 7. persistent urls and citations offered for digital objects by digital libraries / nicholas homenda kenneth j. varnum, editor varnum@umich.edu june 2021 https://ejournals.bc.edu/index.php/ital/about/editorialteam https://ejournals.bc.edu/index.php/ital/article/view/13513 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13461 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/13499 https://ejournals.bc.edu/index.php/ital/article/view/12629 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12751 https://ejournals.bc.edu/index.php/ital/article/view/12589 https://ejournals.bc.edu/index.php/ital/article/view/12835 https://ejournals.bc.edu/index.php/ital/article/view/12959 https://ejournals.bc.edu/index.php/ital/article/view/12963 https://ejournals.bc.edu/index.php/ital/article/view/12987 mailto:varnum@umich.edu in this issue editorial board thoughts: doesn’t work mark cyzyk editorial board thoughts | cyzyk 3 the proof of the pudding’s in the eating. miguel de cervantes saavedra. the ingenious hidalgo don quixote de la mancha. part i, chapter xxxvii, john rutherford, trans. about fifteen years ago i had two students from germany working for me, jens and andreas. those guys were great. they were smart and funny and interesting and always did their best. i would send them out to fix things around the library, and they would dutifully report back with success or failure. i told them that, particularly if there was a problem with a staff workstation, “if it breaks in the morning, it must be fixed by lunchtime; if it breaks in the afternoon, it must be fixed by 5:00.” they understood that if a staff workstation was down, then that probably also meant a staff member was just sitting there, waiting for it to be fixed. if we had to we could slap a sign on a broken public workstation and get back to it later—there were plenty of other working public stations after all—but staff workstations must be working at all times. insofar as we had an aged fleet of pcs whose cmos batteries were rapidly giving out, i kept jens and andreas running around the building quite a bit. on occasion, though, they would report back with the dreaded, “hey boss, doesn’t work.” this was the one thing that would raise my ire. “of course it doesn’t work, that’s why i sent you down there!” i would think. the phrase “doesn’t work” became for me a pavlovian signal that i was about to drop everything and go take a look myself. it now occurs to me, though, that this notion of “work” is precisely the point of technology, and that sometimes this gets lost for those of us employed fulltime as technologists in libraries. let me explain: in my opinion and for the most part, the proper role of the technologist in a library is that of a consultant on loan to the departments to work on projects there, embedded.1 two of the best bosses i ever had said essentially the same thing to me in our introductory first-day-on-the-job chit-chat: “you report to me, but you work for them.” such is the proper attitude in any serviceoriented profession. does this not frequently get inverted, subverted, lost? what happens is that technology starts to take on an importance undeserved. it becomes selfreferential and insular; a technology-for-technology’s-sake attitude arises. mark cyzyk (mcyzyk@jhu.edu) is scholarly communication architect in the sheridan libraries, john hopkins university. mailto:mcyzyk@jhu.edu information technology and libraries | june 2012 4 but technology-for-technology’s-sake is just wrong. technology is merely a means to an end, not an end in itself. the word itself derives from the ancient greek technê, most frequently translated into english as “craft” and frequently distinguished in the greek philosophical literature from epistêmê or (certain) knowledge.2 so it is here that the crucial distinction in the western world between practical and theoretical activities is made, and technology is clearly a practical, not theoretical activity. as such, it has by its very nature practical outcomes in the world: technology works in the world. technology is instrumental in achieving certain practical outcomes; its value is as a tool, instrumentally valuable, not inherently valuable. it is not for its own sake that we implement technology; we implement technology to get some sort of work accomplished in the world. our programming languages, application servers, web application frameworks, ajax libraries, integrated development environments, source-code repositories, build tools, testing harnesses, switches, routers, single-signon utilities, proxy servers, link resolvers, repositories, bibliographic management utilities, help-desk ticketing applications, and elaborate project-management protocols are all for naught if the final product of our labor, at the end of the day, doesn’t work. our product is not only literally useless, it is worse than useless because the library in which we labor has devoted precious resources to it only to result in a service or product that does not properly function, and those are precious resources that could have been spent elsewhere. hey there fellow technologists, why am i being so dismal? i would prefer the term “grave” to “dismal.” significant portions of the library budget are put toward technology each year, and as those whose duty it is to carry our local technology strategies into the future, we need to always be mindful of the fact that each and every dollar spent on technology is a dollar not available for building our collections—surely the direct center of the mission of anyone who calls himself a librarian, a.k.a., a cultural conservationist. (shouldn’t we be wearing badges that read, “to collect and preserve”?) making it work is job one for the technologist in the library. … a colleague and friend of mine once told me, a decade ago, that our fellow colleague made a snippy comment about an important and major web application i had written, “just because it works doesn’t mean it’s right.” now, admittedly, i was a very sloppy code formatter, and yet i certainly would never say that the applications i wrote were steaming plates of spaghetti. on the contrary, i think the code i wrote consisted of good, solid procedural programming. what my disgruntled colleague meant, i think, was that i failed to follow a framework, and by “framework” he naturally meant the same framework to which he’d recently hitched his own coding wagon. my response to his snippiness was, “ah, pretty-it-up all you want, organize it any-which-way, but functional code-code that works--is actually the number one criterion for being good code.” just ask your clients. editorial board thoughts | cyzyk 5 that app i wrote has been in production, happily working away as a key piece of the enterprise network infrastructure at a prominent, multi-campus, east coast university since 2002.3 references 1. and here i heartily agree with my fellow editorial board member, michael witt, when he notes that “[p]art of this process is attempting to feel our users’ pain…”, and i even extend this to the point of us technologists actively working with our users toward a common goal, literally sitting with them, among them, not merely being present to offer occasional support, not merely feeling their pain but being so invested in our common project that their pain is our pain. [did i really just suggest we take on more pain?! yep.] see: michael witt. “eating our own dogfood.” information technology and libraries 30, no. 3 (september 2011) 90. http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf 2. i’m no classics scholar, but this is my recollection from taking a graduate seminar many years ago on this very topic. so while i’m not pulling this entirely out of thin air, i am pulling it from the musty mists of middle-aged memory – that, and a quick scan of professor richard parry’s fine article on this topic in the stanford encyclopedia of philosophy, particularly the section on aristotle’s views. regarding my comments below on technology being instrumentally valuable, i cite parry’s words: “presumably, then, the craftsman does not choose his activity for itself but for the end; thus the value of the activity is in what is made”. see: richard parry. "episteme and techne," the stanford encyclopedia of philosophy, fall 2008 edition, edward n. zalta, editor. http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ 3. mark cyzyk, "the johns hopkins address registration system (jhars): anatomy of an application," educause quarterly 26, no. 3 (2003). https://jscholarship.library.jhu.edu/handle/1774.2/32800 http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ https://jscholarship.library.jhu.edu/handle/1774.2/32800 30 information technology and libraries | march 2010 the path toward global interoperability in cataloging ilana tolkoff libraries began in complete isolation with no uniformity of standards and have grown over time to be ever more interoperable. this paper examines the current steps toward the goal of universal interoperability. these projects aim to reconcile linguistic and organizational obstacles, with a particular focus on subject headings, name authorities, and titles. i n classical and medieval times, library catalogs were completely isolated from each other and idiosyncratic. since then, there has been a trend to move toward greater interoperability. we have not yet attained this international standardization in cataloging, and there are currently many challenges that stand in the way of this goal. this paper will examine the teleological evolution of cataloging and analyze the obstacles that stand in the way of complete interoperability, how they may be overcome, and which may remain. this paper will not provide a comprehensive list of all issues pertaining to interoperability; rather, it will attempt to shed light on those issues most salient to the discussion. unlike the libraries we are familiar with today, medieval libraries worked in near total isolation. most were maintained by monks in monasteries, and any regulations in cataloging practice were established by each religious order. one reason for their lack of regulations was that their collections were small by our standards; a monastic library had at most a few hundred volumes (a couple thousand in some very rare cases). the “armarius,” or librarian, kept more of an inventory than an actual catalog, along with the inventories of all other valuable possessions of the monastery. there were no standard rules for this inventory-keeping, although the armarius usually wrote down the author and title, or incipit if there was no author or title. some of these inventories also contained bibliographic descriptions, which most often described the physical book rather than its contents. the inventories were usually taken according to the shelf organization, which was occasionally based on subject, like most libraries are today. these trends in medieval cataloging varied widely from library to library, and their inventories were entirely different from our modern opacs. the inventory did not provide users access to the materials. instead, the user consulted the armarius, who usually knew the collection by heart. this was a reasonable request given the small size of the collections.1 this type of nonstandardized cataloging remained relatively unchanged until the nineteenth century, when charles c. jewett introduced the idea of a union catalog. jewett also proposed having stereotype plates for each bibliographic record, rather than a book catalog, because this could reduce costs, create uniformity, and organize records alphabetically. this was the precursor to the twentieth-century card catalog. while many of jewett’s ideas were not actually practiced during his lifetime, they laid the foundation for later cataloging practices.2 the twentieth century brought a great revolution in cataloging standards, particularly in the united states. in 1914, the library of congress subject headings (lcsh) were first published and introduced a controlled vocabulary to american cataloging. the 1960s saw a wide array of advancements in standardization. the library of congress (lc) developed marc, which became a national standard in 1973. it also was the time of the creation of anglo-american cataloguing rules (aacr), the paris principles, and international standard bibliographic description (isbd). while many of these standardization projects were uniquely american or british phenomena, they quickly spread to other parts of the world, often in translated versions.3 while the technology did not yet exist in the 1970s to provide widespread local online catalogs, technology did allow for union catalogs containing the records of many libraries in a single database. these union catalogs included the research libraries information network (rlin), the oclc online computer library center (oclc), and the western library network (wln). in the 1980s the local online public access catalog (opac) emerged, and in the 1990s opacs migrated to the web (webpacs).4 currently, most libraries have opacs and are members of oclc, the largest union catalog, used by more than 71,000 libraries in 112 countries and territories.5 now that most of the world’s libraries are on oclc, librarians face the challenge and inconvenience of discrepancies in cataloging practice due to the differing standards of diverse countries, languages, and alphabets. the fields of language engineering and linguistics are working on various language translation and analysis tools. some of these include machine translation; ontology, or the hierarchical organization of concepts; information extraction, which deciphers conceptual information from unorganized information, such as that on the web; text summarization, in which computers create a short summary from a long piece of text; and speech processing, which is the computer analysis of human speech.6 while these are all exciting advances in information technology, as of yet they are not intelligent enough to help us establish cataloging interoperability. it will be interesting to see whether language engineering tools will be capable of helping catalogers in the future, but for now they are ilana tolkoff (ilana.tolkoff@gmail.com) holds a ba in music and italian from vassar college, an ma in musicology from brandeis university, and an mls from the university at buffalo. she is currently seeking employment as a music librarian. the path toward global interoperability in cataloging | tolkoff 31 best at making sense of unstructured information, such as the web. the interoperability of library catalogs, which consist of highly structured information, must be tackled through software that innovative librarians of the future will produce. in an ideal world, oclc would be smoothly interoperable at a global level. a single thesaurus of subject headings would have translations in every language. there would be just one set of authority files. all manifestations of a single work would be grouped under the same title, translatable to all languages. there would be a single bibliographic record for a single work, rather than multiple bibliographic records in different languages for the same work. this single bibliographic record could be translatable into any language, so that when searching in worldcat, one could change the settings to any language to retrieve records that would display in that chosen language. when catalogers contribute to oclc, they would create the records in their respective languages, and once in the database the records would be translatable to any other language. because records would be so fluidly translatable, an opac could be searched in any language. for example, the default settings for the university at buffalo’s opac could be english, but patrons could change those settings to accommodate the great variety of international students doing research. this vision is utopian to say the least, and it is doubtful that we will ever reach this point. but it is valuable to establish an ideal scenario to aim our innovation in the right direction. one major obstacle in the way of global interoperability is the existence of different alphabets and the inherently imperfect nature of transliteration. there are essentially two types of transliteration schemes: those based on phonetic structure and those based on morphemic structure. the danger of phonetic transliteration, which mimics pronunciation, is that semantics often get lost. it fails to differentiate between homographs (words that are spelled and pronounced the same way but have different meanings). complications also arise when there are differences between careful and casual styles of speech. park asserts, “when catalogers transcribe words according to pronunciation, they can create inconsistent and arbitrary records.”7 morphemic transliteration, on the other hand, is based on the meanings of morphemes, and sometimes ends up being very different from the pronunciation in the source language. one advantage to this, however, is that it requires fewer diacritics than phonetic transliteration. park, whose primary focus is on korean–roman transliteration, argues that the mccune reischauer phonetic transliteration that libraries use loses too much of the original meaning. in other alphabets, however, phonetic transliteration may be more beneficial, as in the lc’s recent switch to pinyin transliteration in chinese. the lc found pinyin to be more easily searchable than wade-giles or monosyllabic pinyin, which are both morphemic. however, another problem with transliteration that neither phonetic nor morphemic schemes can solve is word segmentation—how a transliterated word is divided. this becomes problematic when there are no contextual clues, such as in a bibliographic record.8 other obstacles that stand in the way of interoperability are the diverse systems of subject headings, authority headings, and titles found internationally. resource description and access (rda) will not deal with subject headings because it is such a hefty task, so it is unlikely that subject headings will become globally interoperable in the near future.9 fortunately, twenty-four national libraries of english speaking countries use lcsh, and twelve non-english-speaking countries use a translated or modified version of lcsh. this still leaves many more countries that use their own systems of subject headings, which ultimately need to be made interoperable. even within a single language, subject headings can be complicated and inconsistent because they can be expressed as a single noun, compound noun, noun phrase, or inverted phrase; the problem becomes even greater when trying to translate these to other languages. bennett, lavoie, and o’neill note that catalogers often assign different subject headings (and classifications) to different manifestations of the same work.10 that is, the record for the novel gone with the wind might have different subject headings than the record for the movie. this problem could potentially be resolved by the functional requirements for bibliographic records (frbr), which will be discussed below. translation is a difficult task, particularly in the context of strict cataloging rules. it is especially complicated to translate among unrelated languages, where one might be syntactic and the other inflectional. this means that there are discrepancies in the use of prepositions, conjunctions, articles, and inflections. the ability to add or remove terms in translation creates endless variations. a single concept can be expressed in a morpheme, a word, a phrase, or a clause, depending on the language. there also are cultural differences that are reflected in different languages. park gives the example of how angloamerican culture often names buildings and brand names after people, reflecting our culture’s values of individualism, while in korea this phenomenon does not exist at all. on the other hand, korean’s use of formal and informal inflections reflects their collectivist hierarchical culture. another concept that does not cross cultural lines is the korean pumasi system in which family and friends help someone in a time of need with the understanding that the favor will be returned when they need it. this cannot be translated into a single english word, phrase, or subject heading. one way of resolving ambiguity in translations is through modifiers or scope notes, but this is only a partial solution.11 because translation and transliteration are so difficult, 32 information technology and libraries | march 2010 as well as labor-intensive, the current trend is to link already existing systems. multilingual access to subjects (macs) is one such linking project that aims to link subject headings in english, french, and german. it is a joint project under the conference of european national librarians among the swiss national library, the bibliothèque nationale de france (bnf), the british library (bl), and die deutsche bibliothek (ddb). it aims to link the english lcsh, the french répertoire d’autorité matière encyclopédique et alphabétique unifié (rameau), and the german schlagwortnormdatei/ regeln für den schlagwortkatalog (swd/rswk). this requires manually analyzing and matching the concepts in each heading. if there is no conceptual equivalent, then it simply stands alone. macs can link between headings and strings or even create new headings for linking purposes. this is not as fruitful as it sounds, however, as there are fewer correspondences than one might expect. the macs team experimented with finding correspondences by choosing two topics: sports, which was expected to have a particularly high number of correspondences, and theater, which was expected to have a particularly low number of correspondences. of the 278 sports headings, 86 percent matched in all three languages, 8 percent matched in two, and 6 percent was unmatched. of the 261 theater headings, 60 percent matched in three languages, 18 percent matched in two, and 22 percent was unmatched.12 even in the most cross-cultural subject of sports, 14 percent of terms did not correspond fully, making one wonder whether linking will work well enough to prevail. a similar project—the virtual international authority file (viaf)—is being undertaken for authority headings, a joint project of the lc, the bnf, and ddb, and now including several other national libraries. viaf aims to link (not consolidate) existing authority files, and its beta version (available at http://viaf.org) allows one to search by name, preferred name, or title. oclc’s software mines these authority files and the titles associated with them for language, lc control number, lc classification, usage, title, publisher, place of publication, date of publication, material type, and authors. it then derives a new enhanced authority record, which facilitates mapping among authority records in all of viaf’s languages. these derived authority records are stored on oai servers, where they are maintained and can be accessed by users. users can search viaf by a single national library or broaden their possibilities by searching all participating national libraries. as of 2006, between the lc’s and ddb’s authority files, there were 558,618 matches, including 70,797 complex matches (one-to-many), and 487,821 unique matches (one-to-one) out of 4,187,973 lc names and 2,659,276 ddb names. ultimately, viaf could be used for still more languages, including non-roman alphabets.13 recently the national library of israel has joined, and viaf can link to the hebrew alphabet. a similar project to viaf that also aimed to link authority files was linking and exploring authority files (leaf), which was under the auspices of the information society technologies programme of the fifth framework of the european commission. the three-year project began in 2001 with dozens of libraries and organizations (many of which are national libraries), representing eight languages. its website describes the project as follows: information which is retrieved as a result of a query will be stored in a pan-european “central name authority file.” this file will grow with each query and at the same time will reflect what data records are relevant to the leaf users. libraries and archives wanting to improve authority information will thus be able to prioritise their editing work. registered users will be able to post annotations to particular data records in the leaf system, to search for annotations, and to download records in various formats.14 park identifies two main problems with linking authority files. one is that name authorities still contain some language-specific features. the other is that disambiguation can vary among name authority systems (e.g., birth/death dates, corporate qualifiers, and profession/ activity). these are the challenges that projects like leaf and viaf must overcome. while the linking of subject headings and name authorities is still experimental and imperfect, the frbr model for linking titles is much more promising and will be incorporated in the soon-to-be-released rda. according to bennett, lavoie, and o’neill, there are three important benefits to frbr: (1) it allows for different views of a bibliographic database, (2) it creates a hierarchy of bibliographic entities in the catalog such that all versions of the same work fall into a single collapsible entry point, (3) and the confluence of the first two benefits makes the catalog more efficient. in the frbr model, the bibliographic record consists of four entities: (1) the work, (2) the expression, (3) the manifestation, and (4) the item. all manifestations of a single work are grouped together, allowing for a more economical use of information because the title needs to be entered only once.15 that is, a “title authority file” will exist much like a name authority file. this means that all editions in all languages and in all formats would be grouped under the same title. for example, the lord of the rings title would include all novels, films, translations, and editions in one grouping. this would reduce the number of bibliographic records, and as danskin notes, “the idea of creating more records at a time when publishing output threatens to outstrip the cataloguing capacity of national bibliographic agencies is alarming.”16 the frbr model is particularly beneficial for complex canonical works like the bible. there are a small number of complex canonical works, but they take up a the path toward global interoperability in cataloging | tolkoff 33 disproportionate number of holdings in oclc.17 because this only applies to a small number of works, it would not be difficult to implement, and there would be a disproportionate benefit in the long run. there is some uncertainty, however, in what constitutes a complex work and whether certain items should be grouped under the same title.18 for instance, should prokofiev’s romeo and juliet be grouped with shakespeare’s? the advantage of the frbr model for titles over subject headings or name authorities is that no such thing as a title authority file exists (as conceptualized by frbr). we would be able to start from scratch, creating such title authority files at the international level. subject headings and name authorities, on the other hand, already exist in many different forms and languages so that cross-linking projects like viaf might be our only option. it is encouraging to see the strides being made to make subject headings, name authority headings, and titles globally interoperable, but what about other access points within a record’s bibliographic description? these are usually in only one language, or two if cataloged in a bilingual country. should these elements (format, contents, and so on) be cross-linked as well, and is this even possible? what should reasonably be considered an access point? most people search by subject, author, or title, so perhaps it is not worth making other types of access points interoperable for the few occasions when they are useful. yet if 100 percent universal interoperability is our ultimate utopian goal, perhaps we should not settle for anything less than true international access to all fields in a record. because translation and transliteration are such complex undertakings, linking of extant files is the future of the field. there are advantages and disadvantages to this. on the one hand, linking these files is certainly better than having them exist only for their own countries. they are easily executed projects that would not require a total overhaul of the way things currently stand. the disadvantages are not to be ignored, however. the fact that files do not correspond perfectly from language to language means that many files will remain in isolation in the national library that created them. another problem is that cross-linking is potentially more confusing to the user; the search results on http://www.viaf.org are not always simple and straightforward. if cross-linking is where we are headed, then we need to focus on a more user-friendly interface. if the ultimate goal of interoperability is simplification, then we need to actually simplify the way query results are organized rather than make them more confusing. very soon rda will be released and will bring us to a new level of interoperability. aacr2 arrived in 1978, and though it has been revised several times, it is in many ways outdated and mainly applies to books. rda will bring something completely new to the table. it will be flexible enough to be used in other metadata schemes besides marc, and it can even be used by different industries such as publishers, museums, and archives.19 its incorporation of the frbr model is exciting as well. still, there are some practical problems in implementing rda and frbr, one of which is that reeducating librarians about the new rules will be costly and take time. also, frbr in its ideal form would require a major overhaul of the way oclc and integrated library systems currently operate, so it will be interesting to see to what extent rda will actually incorporate frbr and how it will be practically implemented. danskin asks, “will the benefits of international co-operation outweigh the costs of effecting changes? is the usa prepared to change its own practices, if necessary, to conform to european or wider ifla standards?”20 it seems that the united states is in fact ready and willing to adopt frbr, but to what extent is yet to be determined. what i have discussed in this paper are some of the more prominent international standardization projects, although there are countless others, such as eurowordnet, the open language archives community (olac), and international cataloguing code (icc), to name but a few.21 in general, the current major projects consist of linking subject headings, name authority files, and titles in multiple languages. linking may not have the best correspondence rates, we have still not begun to tackle the cross-linking of other bibliographic elements, and at this point search results may be more confusing than helpful. but the existence of these linking projects means we are at least headed in the right direction. the emergent universality of oclc was our most recent step toward interoperability, and it looks as if cross-linking is our next step. only time will tell what steps will follow. references 1. lawrence s. guthrie ii, “an overview of medieval library cataloging,” cataloging & classification quarterly 15, no. 3 (1992): 93–100. 2. lois mai chan and theodora hodges, cataloging and classification: an introduction, 3rd ed. (lanham, md.: scarecrow, 2007): 48. 3. ibid., 6–8. 4. ibid., 7–9. 5. oclc, “about oclc,” http://www.oclc.org/us/en/ about/default.htm (accessed dec. 9, 2009). 6. jung-ran park, “cross-lingual name and subject access: mechanisms and challenges,” library resources & technical services 51, no. 3 (2007): 181. 7. ibid., 185. 8. ibid. continued on page 39 tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” 108 information technology and libraries | september 2011 nancy m. foasberg adoption of e-book readers among college students: a survey understand whether and how students are using e-book readers to respond appropriately. as new media formats emerge, libraries must avoid both extremes: uncritical, hype-driven adoption of new formats and irrational attachment to the status quo. ■■ research context recently introduced e-reader brands have attracted so much attention that it is sometimes difficult to remember that those currently on the market are not the first generation of such devices. the first generation was introduced, to little fanfare, in the 1990s. devices such as the softbook and the rocket e-book reader are well documented in the literature, but were unsuccessful in the market.1 the most recent wave of e-readers began with the sony reader in 2006 and amazon’s kindle in 2007, and thus far is enjoying more success. barnes and noble and borders have entered the market with the nook and the kobo, respectively, and apple has introduced the ipad, a multifunction device that works well as an e-reader. amazon claims that e-book sales for the kindle have outstripped their hardcover book sales.2 these numbers may reflect price differences, enthusiasm on the part of early adopters, marketing efforts on the parts of these particular companies, or a lack of other options for e-reader users because the devices are designed to be compatible primarily with the offerings of the companies who sell them. nevertheless, they certainly indicate a rise in the consumption of e-books by the public, as the dramatic increase in wholesale e-book sales bears out.3 in the meantime, sales of the devices increased nearly 80 percent in 2010.4 with this flurry of activity have come predictions that e-readers will replace print eventually, perhaps even within the next few years.5 books have been published with such bold titles as print is dead.6 however, despite the excitement, e-readers are still a niche market. according to the 2010 pew internet and american life survey, 5 percent of americans own e-book readers. those who do skew heavily to the wealthy and well-educated, with 12 percent having an annual household income of $75,000 or more and 9 percent of college graduates owning an electronic book reader. this suggests that e-book readers are still a luxury item to many.7 to academic librarians, it is especially important to know whether e-readers are being adopted by college students and whether they can be adapted for academic use. e-readers’ virtues, including their light weight, their ability to hold many books at the same time, and the speed with which materials can be delivered, could make them very attractive to students. however, they have many limitations for academic work. most do not provide the ability to copy and paste into another document, have to learn whether e-book readers have become widely popular among college students, this study surveys students at one large, urban, four-year public college. the survey asked whether the students owned e-book readers and if so, how often they used them and for what purposes. thus far, uptake is slow; a very small proportion of students use e-readers. these students use them primarily for leisure reading and continue to rely on print for much of their reading. students reported that price is the greatest barrier to e-reader adoption and had little interest in borrowing e-reader compatible e-books from the library. p ortable e-book readers, including the amazon kindle, barnes and noble nook, and the sony reader, free e-books from the constraints of the computer screen. although such devices have existed for a long time, only recently have they achieved some degree of popularity. as these devices become more commonplace, they could signal important changes for libraries, which currently purchase and loan books according to the rights and affordances associated with print books. however, these changes will only come about if e-book readers become dominant. for academic libraries, the population of interest is college students. their use of reading formats drives collection development practices, and any need to adjust to e-readers depends on whether students adopt them. thus, it is important to research the present state of students’ interest in e-readers. do they own e-readers? do they wish to purchase one? if they do own them, do they use them often and regard them suitable for academic work? the present study surveys students at queens college, part of the city university of new york, to gather information about their attitudes toward and ownership of e-books and e-book readers. because only queens college students were surveyed, it is not possible to draw conclusions about college students in general. however, the data do provide a snapshot of a diverse student body in a large, urban, four-year public college setting. the goal of the survey was to learn whether students own and use e-book readers, and if so, how they use them. in the midst of enthusiasm for the format by publishers, librarians and early adopters, it is important to consult the students themselves, whose preferences and reading habits are at stake. it is also vital for academic libraries to nancy m. foasberg (nfoasberg@qc.cuny.edu) is humanities librarian, queens college, city university of new york, flushing, new york. adoption of e-book readers among college students: a survey | foasberg 109 foundation survey, internet and american life, found that e-readers were luxury items owned by the well educated and well off. in the survey, 5 percent of respondents reported owning an e-reader.12 in the ecar study of undergraduate students and information technology, 3.1 percent of undergraduate college students reported owning an e-book reader, suggesting that college students are adopting the devices at a slower rate than the general population.13 commercial market research companies, including harris interactive and the book industry study group, also have collected data on e-book adoption. the harris interactive poll found that 8 percent of their respondents owned e-readers, and that those who did claimed that they read more since acquiring it. however, as a weighted online poll with no available measure of sampling error, these results should be considered with caution.14 the book industry study group survey, although it was sponsored by several publishers and e-reader manufacturers, appears to use a more robust method. this survey, consumer attitudes toward e-book reading, was conducted in three parts in 2009 and 2010. kelly gallagher, who was responsible for the group that conducted the study, remarks that “we are still in very early days on e-books in all aspects—technology and adoption.” although the size of the market has increased dramatically, the survey found that almost half of all e-readers are acquired as a gift and that half of all e-books “purchased” are actually free. however, among those who used e-books, about half said they mostly or exclusively purchased e-books rather than print. the e-books purchased are mostly fiction (75 percent); textbooks comprised only 11 percent of e-book purchases.15 much of the literature on e-book readers consists of user studies, which provide useful information about how readers might interact with the devices once they have them in hand but provide no information about whether students are likely to use them of their own volition. however, these studies are of interest because they hint at reasons that students may or may not find e-readers useful, important information for predicting the future of e-books. user studies have covered small devices, such as pdas (personal data assistants);16 first-generation e-readers, such as the rocket ebook;17 and more recent e-book readers.18 the results of many recent e-reader user studies have been very similar to studies on the usability of the first generation of e-book readers: the devices offer advantages in portability and convenience but lack good note-taking features and provide little support for nonlinear navigation. amazon sponsored large-scale research on academic uses of e-book readers at universities, such as princeton, case western reserve university, and the university of virginia,19 while other universities, such as northwest missouri state university,20 carried out their own projects limited note-taking capabilities, and rely on navigation strategies that are most effective for linear reading. the format also presents many difficulties regarding library lending. many publishers rely on various forms of drm (digital rights management) software to protect copyrighted materials. this software often prevents e-books from being compatible with more than one type of e-book reader. indeed, because e-book collections in academic libraries predate the emergence of e-book readers, many libraries now own or subscribe to large e-book collections that are not compatible with the majority of these devices. furthermore, publishers and manufacturers have been hesitant to establish lending models for their books. amazon recently announced that they would allow users to lend a book once for a period of fourteen days, if the publisher gave permission.8 this very cautious and limited approach speaks volumes about publishers’ fears regarding user sharing of e-books. several libraries have developed programs for lending the devices,9 but there is no real model for lending e-books to users who already own e-readers. a service called overdrive also provides downloadable collections, primarily of popular fiction, that can be accessed in this manner. however, the collections are small and are not compatible with all devices, including the most popular, the kindle. in the united kingdom, the publisher’s association has provided guidelines under which libraries can lend e-books, which include a requirement that the user physically visit the library to download the e-book.10 clearly, we do not currently have anything resembling a true library lending model for e-reader compatible e-books, especially not one that takes advantage of the format’s strengths. despite the challenges, it is clear that if e-book readers are enthusiastically adopted by students, libraries will need to find a way to offer materials compatible with them. as buczynski puts it, “libraries need to be in play at this critical juncture lest they be left out or sidelined in the emerging e-book marketplace.”11 however, because the costs of participating are likely to be substantial, it is very important to discover whether students are indeed adopting the hardware. few studies have focused on spontaneous student adoption of the devices, although several mention that when students were introduced to e-readers, they appeared to be unfamiliar with the devices and regard them as a novelty. however, e-readers have become more prevalent since many of these studies were conducted. thus this study surveys students to find their attitudes toward e-book readers. ■■ literature review only a few studies have attempted to quantify the popularity of e-readers. as mentioned above, the 2010 pew 110 information technology and libraries | september 2011 their first encounter with an e-book reader.”34 while this is mere anecdote, it, along with the survey results noted above, raises the question of how popular the device really is on college campuses. finally, a third group of studies attempts to predict the future of e-readers and e-books. even before the introduction of e-readers, some saw e-books as the likely future of academic libraries.35 more recently, one report discusses the likelihood of and barriers to e-book adoption. this article concludes that “barriers to e-book adoption still exist, but signs point to this changing within the next two to five years. that, of course, has been said for most of the past 15 to 20 years.”36 still, nelson points out that technologies can become ubiquitous very quickly, using the ipod as an example, and warns libraries against falling behind.37 yet another report puts e-books in the two-tothree-year adoption range and claims that e-books “have reached mainstream adoption in the consumer sector” and that the “obstacles have . . . started to fall away.”38 ■■ method the e-reader survey was conducted as part of queens college’s student technology survey, which also covered several other aspects of students’ interactions with technology. the author is grateful to the center for teaching and learning (in particular, eva fernández and michelle fraboni) for graciously agreeing to include questions about e-readers in the survey and providing some assistance in managing the data. this survey, run through queens college’s center for teaching and learning, was hosted by surveymonkey and was distributed to students through their official e-mail accounts. participants were offered a chance to win an ipod touch as an incentive, but students who did not participate also were offered an opportunity to enter the ipod drawing. the survey was available between april and june 2010. all personally identifying information was removed from the responses to protect student privacy. rather than surveying the entire population about e-readers and e-books, the survey limited most of the questions to students with some experience with the format. of the students who responded to the survey, only 63 (3.7 percent) used e-readers. however, 338 more students identified themselves as users of e-books but did not use e-readers. all other students skipped past the e-book questions and were directed to the next part of the survey. the questions about e-readers fell into several categories. the students were asked about their ownership of devices and which devices they planned to purchase in the future. while they might of course change their minds about future purchases, this is a useful way of measuring whether students regard the devices as desirable. with other e-readers. other types of programs, most notably texas a&m’s kindle lending program,21 and many academic focus groups have also contributed to our knowledge of how students use e-readers. users in nearly every study have praised the portability of these devices. this can be very important to students; users in one study noted that the portability of reading devices allowed them to “reclaim an otherwise difficult to use brief period,”22 and in another, students were able to multitask, doing household chores and studying at the same time.23 adjustable text size and the ability to search for words in the text have also been popular among students, as has the novelty value of these devices. environmental concerns surrounding heavy printing have also been cited as an advantage of e-readers.24 however, the limitations of these devices, some of which are severe in an academic setting, also have been noted. the comments of students at gettysburg college are typical: they liked the e-readers for leisure reading, but found them awkward for classroom use.25 lack of note-taking support was an important drawback for many students. waycott and kukulska-hulme noted that students were much less likely to take notes while reading with a pda than they were with print.26 a study at princeton found that the same was true of students using the kindle,27 and students at northwest missouri state university said they read less with an e-textbook than with a traditional one, although they did not report changes in their study habits.28 despite the ability of many devices to search the text of a book, users in many studies also disliked the inability to skim and browse through the materials as they would with print.29 interestingly, this complaint appeared in studies of all types of e-readers, even those with larger screens. students, in a recent study with the sony reader and ipod touch, noted that these devices did a poor job of supporting pdfs, a standard format for online course materials. the documents were displayed at a very small size and the words were sometimes jumbled.30 whether these drawbacks will prevent students from adopting e-book readers remained to be seen. library and information science (lis) students in a small, week-long study reiterated the problems found in the above studies, but nevertheless found themselves using e-readers extensively and reading more books and newspapers than they had before.31 several of these user studies hint that e-readers are not currently commonplace as far as users often seemed to regard the devices with surprise and curiosity. in some studies, while users were initially attracted to the novelty value of the devices, their enthusiasm dimmed after using the devices and discovering technical problems and limitations.32 one author describes e-readers as “attention getters, but not attention keepers.”33 a study in early 2009, in which students were provided with e-readers, notes that “for the majority of the participants, this was adoption of e-book readers among college students: a survey | foasberg 111 attitudes of students in general, similar surveys should be taken across many campuses in several demographically different areas. researching e-readers is inherently difficult because the landscape is changing very quickly. since the survey began, apple’s ipad became available, prices for dedicated e-readers have dropped dramatically, publishers have become more willing to offer content electronically, and amazon has released a new version of the kindle and has begun taking out television advertisements for it. without a follow-up survey, it is impossible to know whether these events have changed student attitudes. ■■ results and discussion e-reader adoption of the 1,705 students who responded to the survey, 401 say that they read e-books (table 1). most students (338) who use e-books read them on a device other than an e-reader, but 63 say they use a dedicated reader for e-books (table 2). however, when students were asked about the technological devices that they own, only 56 selected e-book readers. perhaps the seven students who use e-book readers but don’t report owning one are sharing or borrowing them, or perhaps they are using a device other than the ones enumerated in the question. aside from table 3, which breaks down the e-reader brands that students own, the following data will be based upon the larger sample of 63 students. the students who read e-books on another device were asked whether they planned to buy an e-reader in the respondents were also asked about their use of e-books. this category includes questions about what kind of reading students use e-books for, how much of their reading uses e-books, and where they are finding their e-books. it was important to learn whether students considered e-book readers appropriate for academic work, and whether they considered the library a potential source for e-books. finally, to assess their attitudes toward e-book readers, students were asked to identify the main benefits and drawbacks of e-book readers. several possibilities were listed, and students were asked to respond to them along a likert scale. a field was also included in which students could fill in their own answers. after 643 incomplete surveys were eliminated, there were 1,705 responses from queens college students. this is about 8 percent of the queens college student body. e-mail surveys always run the risk of response bias, especially when they concern technology. however, students who responded were representative of queens college in terms of sex, age, class standing, major, and other demographic characteristics. the results were compared using a chi-squared test with the level of significance set at 0.05. in some cases, there were too few respondents to test significance properly and comparisons could not be made. please see appendix for the e-reader questions included in the survey instruments. they will be referred to in more depth throughout this article. ■■ survey limitations the survey results may not be generalizable because of the survey’s small sample size. in particular, the 63 respondents who use e-book readers may not be representative of student e-reader owners in general. the survey also relies on self-reporting; no direct observation of student behavior took place. students who do use e-readers may be more comfortable with technology and more likely to respond to e-mail surveys. however, the sample is representative for queens college students, and the percentage of students who own e-book readers is close to the national average at the time the survey was taken (5 percent).39 since only queens college students were surveyed, the results reflect the behavior and attitudes of students at a single large, four-year public college in new york city. the results do not necessarily reflect the experience of students at other types of institutions or in other parts of the united states. the other parts of the technology survey show that qc students are heavy users of technology, so they may adopt new technologies such as e-book readers more quickly than other students. to understand the table 1. e-book use among respondents e-book use number of respondents read e-books 401 (23.5%) do not read e-books 1262 (74.0%) don’t know what an e-book is 42 (2.5%) total 1705 (100%) table 2. devices used to read e-books among e-book readers device used number of respondents (% of e-book users) dedicated e-reader 63 (15.7) other device 338 (84.3) total 401 (100) 112 information technology and libraries | september 2011 desire to buy an ipad, many more than reported owning an e-reader. curiously, the e-reader owners reported that they planned to buy an ipad at the same rate as the other students. it is not clear whether these students plan to replace their e-reader or use multiple devices. in either case, while the arrival of the ipad and other tablet devices seems likely to increase the number of students carrying potential e-reading devices, some of its adopters will probably be students who already own e-readers. not surprisingly, students who used e-readers tended to be early adopters of technology in general (table 4).40 compared to the general pool of respondents, they were much more likely to like or love new technologies and much less likely to describe themselves as neutral or skeptical of them. in a chi-squared test, these differences were significant at a level of 0.001. although e-reading devices have existed since the 1990s, the newest, most popular generation of them is so recent that people who own one now are early adopters by definition. compared to the rest of the survey respondents, both e-reader owners and other e-book users were much more likely to identify as early adopters of technology in general. given this trend, the adoption rate of e-readers among students may slow once the early adopters are satisfied. uses of e-books students who used an e-book reader were asked how much of their reading they did with it and whether they used it for class, recreational, or work-related reading (table 5). students without e-readers were asked the same questions about their use of e-books. while it is likely that students who use e-book readers continue to access e-books in other ways, this distinction was made because this survey was designed to study their use of e-readers specifically. because e-reader users were not asked about their use of e-books in other formats, it is not clear whether their habits with more traditional e-book formats differ from those of other students. fewer than half the e-reader users in the study used the device for two-thirds of their reading or more. in the table below, students who did all their reading and those who did about two-thirds of their reading with e-books are combined, because so few claimed to read e-books exclusively. three students with e-readers and future. the majority had no immediate plans to buy one, with those who said they did not plan to acquire one and those who did not know combining for 62.43 percent. 23.67 percent planned to buy one either within the next year or before leaving college, and the remaining 13.91 percent planned to acquire an e-reader after graduating. despite ergonomic disadvantages, many more students are using e-books on some other device, such as a computer or a cell phone, than are loading them on e-readers. furthermore, a large percentage of these students do not plan to buy an e-book reader. the factors preventing these students from buying e-readers will be covered in more detail in the “attitudes toward e-readers” section below. however, it seems likely that a major factor is price, identified by both e-reader owners and non-owners as the greatest disadvantage of these devices. when asked to list the devices they owned, 56 students named some type of e-book reader. among these, the amazon kindle was the most popular (table 3). as expected, e-readers have yet to be adopted by most students at queens college. at the time of this survey, less than 4 percent of respondents owned one. while the rest of the survey shows that these students are highly wired—82 percent own a laptop less than five years old and 93 percent have high-speed internet access at home—this has not translated to a high rate of e-reader ownership. although apple’s ipad, a tablet device that functions as an e-reader among other things, was not yet released at the time of the survey, it may see wider adoption than the dedicated devices. when the survey was originally distributed, this device had been announced but not yet released. overall, 8 percent of students expressed a table 3. e-reader brands owned by students devices owned number of students (% of e-reader owners) amazon kindle 26 (46.4%) barnes & noble nook 14 (25.0%) sony reader 10 (17.9%) other 6 (10.7%) total 56 (100.0%) table 4. e-reader use and self-identification as an early adopter e-reader owners all respondents love or like new technologies 40 (63.5%) 698 (40.9%) neutral or skeptical about new technologies 23 (36.5%) 1007 (59.1%) total 63 (100.0%) 1705 (100.0%) adoption of e-book readers among college students: a survey | foasberg 113 pleasure. this finding is much more surprising, given the very slow adoption of e-books before the introduction of e-readers, and the ergonomic problems with reading from vertical screens. however, students who used e-books without e-readers were much more likely to read e-books for classes. this difference may be due to the sorts of material that are available in each format. although textbook publishers have shown interest in creating e-textbooks for use on devices such as the ipad, there is little selection available for e-readers as yet. when working without e-book readers, however, there is a wide variety of academic materials available in electronic formats, and many textbooks include an online component. academic libraries, including the one at queens college, subscribe to large e-book collections of academic materials. for the most part, these collections cannot be used on an e-reader, but they are available through the library’s website to students with an internet connection and a browser. it is also possible that the e-readers are not well suited to class readings. some past studies, cited above, have found that e-readers do not accommodate functions such as note taking, skimming, and non-sequential navigation very well. since these are important functions for academic work, and both print books and “traditional” e-books are superior in these respects, such limitations may prevent students from using e-readers for classes. the user behaviors reported here do not appear to herald the end of print; in fact, very few students with e-readers use them for all their reading, and over half of the students with e-readers use them for one-third of their reading or less. it is not clear whether students intentionally choose to read some materials in print and others with nine without said they used e-books for all their reading. very few students without e-book readers used e-books for a large proportion of their reading; indeed, 54 percent said they used e-books for less than a third of their reading. differences between the groups were tested for significance using a chi-squared test. note that percentages may not add up to 100 percent, due to rounding. since many studies of e-book readers have found them more suitable for recreational reading than for academic work, users of e-readers were asked to identify the kinds of readings for which they used e-readers and asked to identify all options that they found applicable (table 6). since students were allowed to choose more than one option, the totals are greater than the number of participants. indeed, e-readers were much more likely to be used for recreational reading and other types of e-books far more likely to be used for class. for other types of reading, differences between these groups were not significant. since e-readers have been marketed largely for the popular fiction market and are designed to accommodate casual linear reading, it is not surprising that students who use them are most likely to report using them for leisure reading. in this area they seem to enjoy a strong advantage over more traditional e-book formats read on another device such as a computer or a cell phone. however, the study did not control for the amount of reading that students do. students who use e-readers may be heavier leisure readers in general. further research could clarify whether heavier use of leisure e-reading is due to the devices or the tendencies of those who own them. a large proportion of the students who read e-books without e-readers (65.7 percent) do read e-books for table 5. amount of reading done with e-books amount of reading e-reader users other users x2 significance level significant? about two-thirds or all 27 (42.8%) 65 (19.2%) 16.8 0.001 yes about a third 14 (22.2%) 90 (26.6%) 0.1 0.5 no less than a third 22 (34.9%) 183 (54.1%) 7.9 0.01 yes total 63 (99.9%) 338 (99.9%) ———table 6. types of reading done with e-books type of reading e-reader users other users x2 significance level significant? recreational 54 (85.7%) 222 (65.7%) 9.9 0.01 yes class 24 (38.1%) 217 (64.2%) 14.7 0.001 yes work 11 (17.8%) 88 (26.0%) 2.1 0.5 no other 3 (4.8%) 8 (2.4%) 1.1 0.5 no 114 information technology and libraries | september 2011 from the manufacturer of the e-reader that supports them, this result is not surprising. it suggests that these booksellers have a high degree of power in the market, a potential effect of e-readers that deserves further attention. however, official e-book sellers of the sort mentioned above are not the only option for students seeking digital reading material, since both independent online bookstores and open access repositories such as project gutenberg were used by students. libraries, both public and academic, reached traditional e-book users much more successfully than e-reader users. although many libraries have large e-book collections, there is currently little material for e-readers. despite the existence of a service called overdrive, which provides e-books compatible with some e-readers (excluding the kindle), circulating e-books is challenging, due to a host of technical and legal problems. given this environment, it is not surprising that students without e-readers were more likely to use their public library as a source of e-books than were e-reader users. the queens college campus library, which offers many electronic collections but none that are e-reader-friendly, fared worse; only one student claimed to have used it as a source of e-reader compatible materials. in the free comment field, students mentioned other sources of e-books such as the apple itunes store, the campus bookstore and lulu.com, an online bookseller that also provides self-publishing. several also admitted, unprompted, that they download books illegally. attitudes toward e-readers in the interests of learning what caused students to adopt e-readers or not, the survey used a series of likert-style questions to ask what the students considered the benefits and drawbacks of such devices. strikingly, e-reader owners and non-owners agreed about both the advantages and disadvantages; owning an e-reader did not seem to change most of the things that students value and dislike about it. figure 1 shows the number of students in each group who their e-reader, or whether they are limited by the materials available for the e-reader. the circumstances under which students switch between electronic and print would be an excellent area for future research; is it a matter of what is practically available, or is the e-reader better suited for some texts and reading circumstances than others? sources of e-books the major producers of e-readers are either primarily booksellers, such as amazon and barnes & noble, or are hardware manufacturers who also provide a store where users can purchase e-books, such as sony (or, after the ipad launch, apple). in both models, the manufacturers hope to sell e-books to those who have purchased their devices. they provide more streamlined ways of loading these e-books on their devices, and in some cases use drm to prevent their e-books from being used on competing devices, as well as to inhibit piracy. table 7 shows the sources from which readers with and without e-readers obtain e-books. e-reader users were much more likely than non-users to get their e-books from the official store associated it—that is, the store providing the e-reader, such as amazon, barnes and noble, or sony’s readerstore. there was no significant difference between the two groups’ use of open access or independent sources, but the students who did not use e-readers were much more likely to use e-books from their public library, and while 19.8 percent of students without e-readers used the campus library as a source of e-books, only one student with an e-reader did. since respondents were allowed to choose more than one answer, the results do not sum up to 100 percent. by a wide margin, students who own e-readers are most likely to purchase their e-reading materials from the “official” store; 86 percent cited the official store as a source of e-books. students without e-readers also use these stores more than any other source of e-books, but they are nevertheless far less likely to use them than e-reader users. because it is much easier to buy e-books table 7. sources of e-books how do you get e-books? e-reader users other users x2 significance level significant? store specific to popular e-readers 54 (85.7%) 154 (45.6%) 34.2 0.001 yes open access repositories 16 (25.4%) 120 (35.5%) 2.4 0.5 no public library 10 (15.9%) 99 (29.3%) 4.8 0.05 yes independent online retailer 9 (14.3%) 71 (21.0%) 1.5 0.5 no other 4 (6.3%) 39 (11.5%) n/a n/a n/a campus library 1 (1.6%) 67 (19.8%) n/a n/a n/a adoption of e-book readers among college students: a survey | foasberg 115 students with e-readers were more likely than others to rate portability and convenience as “very valuable.” as the studies cited above suggest, being able to easily download books, carry them away from the computer, and store many books on a single device are very appealing to students. only the final two features, text-to-speech and special features such as dictionaries, attracted enough “not very valuable” or “not valuable” responses for an inter-group comparison. both groups considered text-to-speech the least valuable feature, but students who did not own e-readers were significantly more likely to consider it a valuable or very valuable feature, perhaps indicating that the users to whom this is important have avoided the devices, which currently support it in a very limited fashion. perhaps, too, students with e-readers rated this feature less useful because of its current limitations. in either case, rated each feature either valuable or very valuable. if the positive features of the devices are ranked based on the percentage of respondents who considered them very valuable, the order is almost the same for students with and without e-readers. for students with e-readers, the features rank as follows: portability, convenience, storage, special functions, and text-to-speech. for those without, convenience ranks slightly higher than portability; all other features rank in the same order. tables 8 and 9 present the results of these questions in more detail. for the sake of brevity, the chi-squared results have been omitted. any differences considered significant in the discussion below are significant at least at the 0.05 level. nearly all e-reader users and a strong majority of other e-book users rated portability, convenience, and storage either “valuable” or “very valuable,” though figure 1. features rated “valuable” or “very valuable” 116 information technology and libraries | september 2011 among respondents suggests that that many of those who do not own an e-book reader are unfamiliar with the technology. since e-readers are primarily sold over the internet, many people have not had a chance to see or handle one, perhaps partly explaining this result. if they become more widespread, this may well change. not surprisingly, respondents who did not own e-readers were significantly more likely to prefer print. however, it is worth noting that even among students who did use e-readers, over a third “agree” or “completely agree” that they prefer print, with another third neither agreeing nor disagreeing. use of e-readers does not appear to indicate hostility toward print. this is consistent with the students’ self-reports of e-reader use; as reported above, over half of the students surveyed use e-readers for one-third of their reading or less. thus, it seems unlikely that most of these students plan to totally abandon print any time soon; rather, e-readers are providing another format that they use in addition to print. as for students who do not use e-readers, over half say they prefer print, but this is far from their most widespread concern; rather, like e-reader owners, they are most likely to cite the cost of the reader or the selection of books available as a drawback of the devices. queens college students considered price the most important drawback of e-readers. for both groups (owners and non-owners), it was the factor most likely to be identified as a concern, and the difference between the it was the only variable listed in the survey for which either the “not very valuable” and “not valuable” responses from either group amounted to a combined total of greater than 10 percent of the respondents in that group. in addition to valuing the same features, e-reader owners and non-owners had similar concerns about the device. figure 2 shows the number of respondents in each group who agreed or completely agreed that the issues listed were one of the main shortcomings of e-book readers. tables 10 and 11 give the responses in more detail. the responses with which the most respondents either agreed or completely agreed were the same: cost of e-reader, selection of e-books, and cost of e-books, in that order. although groups such as the electronic frontier foundation have raised concerns about privacy issues related to e-readers,41 these issues have made little impression on students; both e-reader users and nonusers were in agreement in putting privacy at the bottom of the list. one exception to the general agreement between e-reader users and other e-book readers was concern about eyestrain. the majority (63 percent) of those who do not use e-readers either “completely agree” or “agree” that eyestrain is a drawback, while only 29 percent of e-reader owners did. this was a major concern for early e-readers, leading the current generation of these devices to use e-ink, a technology that resembles paper and is thought to eliminate the eyestrain problem. the disparity table 8. value of e-reader features, according to e-reader users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 52 (82.54%) 10 (15.87%) 1 (1.59%) 0 (0.00%) 0 (0.00%) 0 (0.00%) convenience 46 (73.02%) 13 (20.63%) 1 (1.59%) 1 (1.59%) 1 (1.59%) 1 (1.59%) storage 42 (66.67%) 16 (25.40%) 2 (3.17%) 1 (1.59%) 0 (0.00%) 2 (3.17%) special functions 32 (50.79%) 18 (28.57%) 7 (11.11%) 3 (4.76%) 3 (4.76%) 0 (0.00%) text-speech 10 (15.87%) 13 (20.63%) 12 (19.05%) 16 (25.40%) 11 (17.46%) 1 (1.59%) table 9. value of e-reader features, according to other e-book users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 199 (58.88%) 89 (26.33%) 39 (11.53%) 4 (1.18%) 5 (1.48%) 2 (0.06%) convenience 194 (57.40%) 98 (28.99%) 34 (10.06%) 7 (2.07%) 2 (0.59%) 3 (0.89%) storage 181 (53.55%) 99 (29.28%) 40 (11.83%) 10 (2.96%) 4 (1.18%) 4 (1.18%) special functions 169 (50.00%) 82 (24.26%) 58 (17.16%) 22 (6.51%) 4 (1.18%) 3 (0.89%) text-speech 95 (28.11%) 77 (22.78%) 77 (22.78%) 50 (14.79%) 35 (10.36%) 4 (1.18%) adoption of e-book readers among college students: a survey | foasberg 117 responded, but they brought up issues such as highlighting, battery life, and the small size of the screen. another student was more confident in the value of e-readers and used this space to proclaim paper books dead. e-book circulation programs finally, students were asked whether they would be interested in checking out e-readers with books loaded on them from the campus library (table 12). as is often the case when a survey asks for interest in a prospective new service, the response was very positive. however, it was expected that many of the students would prefer to download materials for devices that they already own to take advantage of the convenience of e-readers. on the contrary, a high percentage of both types of students expressed interest in checking out e-book readers, but very few wished to check out e-books two groups was not significant. at the time this survey was taken, amazon’s kindle cost close to $300 and barnes and noble’s nook was priced similarly. soon after the survey closed, however, the major e-reader manufacturers engaged in a “price war,” which resulted in the prices of the best-known dedicated readers, amazon’s kindle and barnes and noble’s nook, falling to under $200. given the feeling among survey respondents that the price of the readers is a serious drawback, this reduction may cause the adoption rate to rise. it would be worthwhile to repeat this survey or a similar one in the near future to learn whether the e-reader price war has had any effect upon price-sensitive students. in the pilot survey, students had written in further responses about the drawbacks of e-readers, but not about their benefits. while some of those responses were incorporated into the final survey, a free text field was also added to catch any further comments. few students figure 2. drawbacks with which students “agree” or “completely agree” 118 information technology and libraries | september 2011 ■■ future research although this survey provides some data to help libraries think about the popularity of e-readers among students, many aspects of students’ use of e-readers remain unexplored. further research on how student adoption of e-book readers varies by location and demographics, particularly considering students’ economic characteristics, for a device of their own. even students who owned e-readers were much more likely to express interest in checking out the device than checking out materials to read on it. this preference belies the common assumption that users do not wish to carry multiple devices and prefer to download everything electronically. instead, they were interested in checking out an e-reader from the library. unless the emphasis of the question altered the results, it is somewhat difficult to account for this response. table 10. drawbacks of e-readers, according to e-reader owners completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 19 (30.16%) 23 (36.51%) 13 (20.63%) 7 (11.11%) 0 (0.00%) 1 (1.59%) selection 11 (17.46%) 26 (41.27%) 12 (19.05%) 7 (11.11%) 6 (9.52%) 1 (1.59%) cost of e-books 10 (15.87%) 20 (31.75%) 16 (25.40%) 11 (17.46%) 5 (7.94%) 1 (1.59%) prefer print 6 (9.52%) 16 (25.40%) 21 (33.33%) 11 (17.46%) 8 (12.70%) 1 (1.59%) eyestrain 7 (11.11%) 11 (17.46%) 20 (31.75%) 15 (23.81%) 9 (14.29%) 1 (1.59%) interface 7 (11.11%) 10 (15.87%) 24 (38.10%) 9 (14.29%) 8 (12.70%) 5 (7.94%) privacy 3 (4.76%) 9 (14.29%) 13 (20.63%) 26 (41.27%) 11 (17.46%) 1 (1.59%) table 11. drawbacks of e-readers, according to other e-book users completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 146 (43.20%) 117 (34.62%) 50 (14.79%) 14 (4.14%) 11 (3.25%) 0 (0.00%) selection 80 (23.67%) 136 (40.24%) 84 (24.85%) 27 (7.99%) 7 (2.07%) 4 (1.18%) cost of e-books 94 (27.81%) 121 (35.80%) 76 (22.49%) 37 (10.95%) 10 (2.96%) 0 (0.00%) prefer print 78 (23.08%) 99 (29.29%) 116 (34.32%) 25 (7.40%) 19 (5.62%) 1 (0.30%) eyestrain 84 (24.85%) 129 (38.17%) 80 (23.67%) 33 (9.76%) 11 (3.25%) 1 (0.30%) interface 43 (12.72%) 82 (24.26%) 145 (42.90%) 33 (9.76%) 20 (5.92%) 15 (4.44%) privacy 39 (11.54%) 65 (19.23%) 144 (42.60%) 49 (14.50%) 40 (11.83%) 1 (0.30%) table 12. interest in checking out preloaded e-readers from the library e-reader owners other e-book users would be interested in checking out e-readers 44 (70.0%) 257 (76.0%) would not be interested in checking out e-readers 4 (6.3%) 38 (11.2%) would not be interested in checking out e-readers, but would like to check out e-books to read on my own e-reader 15 (23.8%) 43 (12.7%) total 63 (100.1%) 338 (99.9%) adoption of e-book readers among college students: a survey | foasberg 119 whom would not object to using a print edition if one were available. under these circumstances, and realizing that the future popularity of e-readers is far from guaranteed, developing such models is, for now, more important than putting them into practice in the short term. references 1. nancy k. herther, “the ebook reader is not the future of ebooks,” searcher 16, no. 8 (2008): 26–40, http://search.ebsco host.com/login.aspx?direct=true&db=a9h&an=34172354&site =ehost-live (accessed dec. 22, 2010). 2. charlie sorrel, “amazon: e-books outsell hardcovers,” wired, july 20, 2010, http://www.wired.com/gadgetlab/ 2010/07/amazon-e-books-outsell-hardcovers/ (accessed dec. 22, 2010). 3. international digital publishing forum, “industry statistics,” oct. 2010, http://www.idpf.org/doc_library/indus trystats.htm (accessed dec. 22, 2010). 4. kathleen hall, “global e-reader sales to hit 6.6m 2010,” electronics weekly, dec. 9, 2010, http://www.electronicsweekly .com/articles/2010/12/09/50083/global-e-reader-sales-to -reach-6.6m-2010-gartner.htm (accessed dec. 22, 2010). 5. cody combs, “will physical books be gone in five years?” video interview with nicholas negroponte, cnn, oct. 18, 2010, http://www.cnn.com/2010/tech/innovation/10/17/negro ponte.ebooks/index.html (accessed dec. 22, 2010). 6. jeff gomez, print is dead: books in our digital age (basingstoke, uk: palgrave macmillan, 2009). 7. aaron smith, “e-book readers and tablet computers,” in americans and their gadgets (washington, d.c.: pew internet & american life project, 2010), http://www.pewinternet.org/ reports/2010/gadgets/report/ebook-readers-and-tablet -computers.aspx (accessed dec. 22, 2010). 8. alex sharp, “amazon announces kindle book lending feature is coming in 2010,” suite101, oct. 26, 2010, http:// www.suite101.com/content/amazon-announces-kindle-book -lending-feature-is-coming-in-2010-a300036#ixzz18cxanfke (accessed dec. 22, 2010). 9. karl drinkwater, “e-book readers: what are librarians to make of them?” sconul focus 49 (2010): 4–10, http://www .sconul.ac.uk/publications/newsletter/49/2.pdf (accessed dec. 22, 2010). drinkwater provides an overview and a discussion of the challenges and benefits of such programs. 10. benedicte page, “pa sets out restrictions on library e-book lending,” the bookseller, oct. 21, 2010, http://www .thebookseller.com/news/132038-pa-sets-out-restrictions-on -library-e-book-lending.html (accessed dec. 22, 2010). 11. james a. buczynski, “library ebooks: some can’t find them, others find them and don’t know what they are,” internet services reference quarterly 15, no. 1 (2010): 11–19, doi: 10.1080/10875300903517089, http://dx.doi.org/ 10.1080/10875300903517089 (accessed dec. 22, 2010). 12. smith, “e-book readers and tablet computers,” http:// www.pewinternet.org/reports/2010/gadgets/report/ebook -readers-and-tablet-computers.aspx (accessed dec. 22, 2010). 13. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, colo.: educause, 2010), http://net.educause. is certainly important. more research on the habits of students with e-readers would also help libraries and universities to better serve their needs. in particular, while this survey found that students tend to switch between electronic and print formats, little is yet known about when and why they move from one to the other. it will also be important to research the differences between the reading habits of students who own e-readers and those who do not, as this may prove useful in interpreting the survey data about types of reading done with different kinds of e-books. furthermore, since the e-book market changes quickly, continuing to research student adoption of e-readers is also important to monitor student reactions to new developments. ■■ conclusion while many queens college students express an interest in e-readers, and even those who do not own one believe that their portability and convenience offer valuable advantages, only a small percentage of students, many of whom are early adopters of technology in general, actually use one. furthermore, even those who own e-readers do not use them exclusively, and only a third say they prefer it to print. in light of these responses, the proper response to this technology may not be a discussion about whether “paper books are dead” (as one of the survey respondents wrote in the comment field) but how each format is used. research on when, where, and for what purposes students might choose print or electronic has already begun.42 many of the factors that contribute to the niche status of e-readers are changing. competition between manufacturers has brought down the price of the reader itself, and the selection of books available for them is improving. because these were some of the most important problems standing in the way of e-reader adoption for queens college students, e-reader ownership could increase rapidly. the lack of a significant difference between the attitudes of e-reader owners and nonowners merits further emphasis and examination, as it may indicate that price is indeed the major barrier to e-reader ownership. although the prices are lower now than they were when the survey was originally taken, this would present a major concern if e-readers became the expected format in which students read, perhaps even the possibility of a new kind of digital divide. as the future is uncertain, it is important for academic libraries to pay attention to their students’ adoption of e-readers, and to consider models under which they can provide materials compatible with them. however, it is important to remember that such materials would, at present, be accessible to only a small subset of users, many of 120 information technology and libraries | september 2011 20. jon t. rickman et al., “a campus-wide e-textbook initiative,” educause quarterly 32, no. 2 (2009), http://www.edu cause.edu/library/eqm0927 (accessed dec. 22, 2010). 21. dennis t. clark, “lending kindle e-book readers: first results from the texas a&m university project,” collection building 28, no. 4 (2009): 146–49, doi: 10.1108/01604950910999774, http://www.emeraldinsight.com/journals.htm?articleid=18174 06&show=abstract (accessed dec. 22, 2010). 22. marshall and rutolo, “reading-in-the-small,” 58. 23. mallett, “a screen too far?” 142. 24. “e-reader pilot at princeton.” 25. foster and remy, “e-books for academe,” 6. 26. waycott and kukulska-hulme, “students’ experiences with pdas,” 38. 27. “e-reader pilot at princeton.” 28. rickman, “a campus-wide e-textbook initiative.” 29. dennis t. clark et al., “a qualitative assessment of the kindle e-book reader: results from initial focus groups,” performance measurement and metrics 9, no. 2 (2008): 118–129, doi: 10.1108/14678040810906826, http://www.emeraldinsight .com/journals.htm?articleid=1736795&show=abstract (accessed dec. 22, 2010); james dearnley, cliff mcknight, and anne morris. “electronic book usage in public libraries: a study of user and staff reactions to a pda-based collection,” journal of librarianship and information science 36, no. 4 (2004): 175–182, doi: 10.1177/0961000604050568, http://lis.sagepub.com/content/36/4/175 (accessed dec. 22, 2010); mallett, “a screen too far?” 143; waycott and kukulska-hulme, “students’ experiences with pdas,” 36. 30. mallet, “a screen too far?” 142–43. 31. m. cristina pattuelli and debbie rabina. “forms, effects, function: lis students’ attitudes toward portable e-book readers,” aslib proceedings: new information perspectives 62, no. 3 (2010): 228–44, doi: 10.1108/00012531011046880, http://www .emeraldinsight.com/journals.htm?articleid=1863571&show=ab stract (accessed dec. 22, 2010). 32. see, for example, gil-rodriguez and planella-ribera, “educational uses of the e-book,” 58–59; and cliff mcknight and james dearnley, “electronic book use in a public library,” journal of librarianship & information science 35, no. 4 (2003): 235–42, doi: 10.1177/0961000603035004003, http://lis.sagepub .com/content/35/4/235 (accessed dec. 22, 2010). 33. rickman et al. “a campus-wide e-textbook initiative.” 34. maria kiriakova et al., “aiming at a moving target: pilot testing ebook readers in an urban academic library,” computers in libraries 30, no. 2 (2010): 20–24, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=48757663 &site=ehost-live (accessed dec. 22, 2010). 35. mark sandler, kim armstrong, and bob nardini, “market formation for e-books: diffusion, confusion or delusion?” the journal of electronic publishing 10, no. 3 (2007), doi: 10.3998/3336451.0010.310, http://quod.lib.umich.edu/cgi/t/ text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed dec. 22, 2010). 36. mark r. nelson, “e-books in higher education: nearing the end of an era of hype?” educause review 43, no. 2 (2008), http://www.educause.edu/educause+review/ educausereviewmagazinevolume43/ebooksinhigher educationnearing/162677 (accessed dec. 22, 2010). 37. ibid. 38. l. johnson et al., the 2010 horizon report (austin, tex.: edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed dec. 22, 2010). 14. harris interactive, “one in ten americans use an e-reader; one in ten likely to get one in the next six months,” press release, sept. 22, 2010, http://www.piworld.com/com mon/items/biz/pi/pdf/2010/09/pi_pdf_harrispoll_ereaders. pdf (accessed dec. 22, 2010). 15. kat meyer, “#followreader: consumer attitudes toward e-book reading,” blog posting, o’reilly radar, aug. 4, 2010, http://radar.oreilly.com/2010/08/followreader-consumer-atti tudes-toward-e-book-reading.html (accessed dec. 22, 2010). 16. the following articles are all based on user studies with small form factor devices: paul lam, shun leung lam, john lam and carmel mcnaught, “usability and usefulness of ebooks on ppcs: how students’ opinions vary over time,” australasian journal of educational technology 25, no. 1 (2009): 30–44, http:// www.ascilite.org.au/ajet/ajet25/lam.pdf (accessed dec. 22, 2010); catherine c. marshall and christine rutolo, “readingin-the-small: a study of reading on small form factor devices,” in jcdl ’02 proceedings of the 2nd acm/ieee-cs joint conference on digital libraries (new york: acm, 2002): 56–64. doi: 10.1145/544220.544230, http://portal.acm.org/citation .cfm?doid=544220.544230 (accessed dec. 22, 2010); and j. waycott and a. kukulska-hulme, “students’ experiences with pdas for reading course materials,” personal ubiquitous computing 7, no. 1 (2002): 30–43, doi: 10.1007/s00779–002–0211-x, http://www .springerlink.com/content/w288kry251dd2vcd/ (accessed dec. 22, 2010). 17. some examples in an academic context: james dearnley and cliff mcknight, “the revolution starts next week: the findings of two studies considering electronic books,” information services & use 21, no. 2 (2001): 65–78, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=5847810& site=ehost-live (accessed dec. 22, 2010); and eric j. simon, “an experiment using electronic books in the classroom,” journal of computers in mathematics & science teaching 21, no. 1 (2002): 53–66, http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml?recid= 0bc05f7a67b1790e5237dc070f466830549a60a87b3fa34bd0b8951acd 7a879da9fa151218a88252&fmt=h (accessed dec. 22, 2010). 18. eva patrícia gil-rodriguez and jordi planella-ribera, “educational uses of the e-book: an experience in a virtual university context,” in hci and usability for education and work, ed. andreas holzinger, lecture notes in computer science no. 5298 (berlin: springer, 2008): 55–62, doi: 10.1007/9783-540-89350-9-5, http://www.springerlink.com/content/ d357482823j10m96/ (accessed dec. 22, 2010); “e-reader pilot at princeton, final report,” (princeton university, 2009), http:// www.princeton.edu/ereaderpilot/ereaderfinalreportlong .pdf (accessed dec. 22, 2010); gavin foster and eric d. remy. “e-books for academe: a study from gettysburg college,” educause research bulletin, no. 22 (2009), http://www.educause .edu/resources/ebooksforacademeastudyfromgett/187196 (dec. 22, 2010); and elizabeth mallett, “a screen too far? findings from an e-book reader pilot,” serials 23, no. 2 (2010): 14–144, doi: 10.1629/23140, http://uksg.metapress.com/ media/mfpntjwvyqtggyjvudu7/contributions/f/3/2/6/ f32687v5r12n5h77.pdf (accessed july 11, 2011). 19. steve kolowich, “colleges test amazon’s kindle e-book reader as study tool,” usa today, feb. 23, 2010, http://www .usatoday.com/news/education/2010–02–23-ihe-amazon-kin dle-for-college23_st_n.htm (accessed dec. 22, 2010). adoption of e-book readers among college students: a survey | foasberg 121 question 22, and was reused in the current survey. again, the author extends thanks to michelle fraboni and eva fernández, who ran this portion of the survey at queens college and allowed the use of their data. 41. electronic frontier foundation, “updated and corrected: e-book buyer’s guide to privacy,” deeplinks blog, jan. 6, 2010, http://www.eff.org/deeplinks/2010/01/updated-and-corrected-e-book-buyers-guide-privacy (accessed dec. 22, 2010). 42. pattuelli and rabina, “lis students’ attitudes.” new media consortium, 2010), http://wp.nmc.org/horizon2010/chapters/electronic-books/ (accessed july 11, 2011). 39. aaron smith, “e-book readers and tablet computers,” h t t p : / / w w w. p e w i n t e r n e t . o rg / r e p o r t s / 2 0 1 0 / g a d g e t s / report/ebook-readers-and-tablet-computers.aspx (accessed july 11, 2011). 40. this question was located in a portion of the survey not focused on e-book readers and thus does not appear in the appendix. the question derives from smith and caruso, 105, 122 information technology and libraries | september 2011 appendix. queens college student technology survey adoption of e-book readers among college students: a survey | foasberg 123 124 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 125 126 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 127 128 information technology and libraries | september 2011 generating collaborative systems for digital libraries | hilera et al. 195 josé r. hilera, carmen pagés, j. javier martínez, j. antonio gutiérrez, and luis de-marcos an evolutive process to convert glossaries into ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. it would be what is normally called a “lightweight” ontology,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. this paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the ieee standard glossary of software engineering terminology.7 ■■ ontologies, the semantic web, and libraries within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. this may be observed particularly within the realm of digital libraries, although, as krause asserts, objections to their use have often been raised by the digital library community.8 one of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. the term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. the term has been widely employed since 2001, when berners-lee et al. envisaged the semantic web, which aims to turn the information stored on the web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 to accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. to do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. technologies for the semantic web have been developed by the world wide web consortium (w3c). the most relevant technologies are rdf (resource description this paper describes a method to generate ontologies from glossaries of terms. the proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). these products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. although this method has been applied to produce an ontology from the “ieee standard glossary of software engineering terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the semantic web. f rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 ontologies belong at this last level. according to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. there are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri.3 this paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain.4 given the definition offered by neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. if the developed ontology is based exclusively on the josé r. hilera (jose.hilera@uah.es) is professor, carmen pagés (carmina.pages@uah.es) is assistant professor, j. javier martínez (josej.martinez@uah.es) is professor, j. antonio gutiérrez (jantonio.gutierrez@uah.es) is assistant professor, and luis de-marcos (luis.demarcos@uah.es) is professor, department of computer science, faculty of librarianship and documentation, university of alcalá, madrid, spain. 196 information technology and libraries | december 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; software and system testing; software architecture; software development process; software development techniques; and software tools.15 in the glossary, entries are arranged alphabetically. an entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “cm.” if a term has more than one definition, the definitions are numbered. in most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. examples, notes, and illustrations have been added to clarify selected definitions. cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different meaning; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. figure 2 shows an example of one of the definitions of the glossary terms. note that definitions can also include framework),10 which defines a common data model to specify metadata, and owl (ontology web language),11 which is a new markup language for publishing and sharing data using web ontologies. more recently, the w3c has presented a proposal for a new rdf-based markup system that will be especially useful in the context of libraries. it is called skos (simple knowledge organization system), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabularies.12 the emergence of the semantic web has created great interest within librarianship because of the new possibilities it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ils opacs.13 for that reason, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have compatible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like rdf. there are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. for example, folksonomies, though originating within the web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs.”14 authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be combined with other ontologies used in information systems running in both conventional and digital libraries for indexing as well as for supporting document searches. there are examples of glossaries that have been transformed into ontologies, such as the cambridge healthtech institute’s “pharmaceutical ontologies glossary and taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ ieee standard glossary of software engineering terminology to demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. the glossary, available in electronic format (pdf), defines approximately 1,300 terms in the domain of software engineering (figure 1). topics include addressing assembling, compiling, linking, loading; computer performance evaluation; figure 1. cover of the glossary document generating collaborative systems for digital libraries | hilera et al. 197 4. define the classes and the class hierarchy 5. define the properties of classes (slots) 6. define the facets of the slots 7. create instances as outlined in the introduction, the ontology developed using our method is a terminological one. therefore we can ignore the first two steps in noy’s and mcguinness’ process as the concepts of the ontology coincide with the terms of the glossary used. any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different methods. in our case, since the ontology has a terminological character, we have established an incremental development process that supposes the natural evolution of the glossary from its original format (dictionary or vocabulary format) into an ontology. the proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. the intermediate products and the final examples associated with the described concept. in the resulting ontology, the examples were included as instances of the corresponding class. in figure 2, it can be seen that the definition refers to another glossary on programming languages (std 610.13), which is a part of the series of dictionaries related to computer science (“ieee std 610,” figure 3). other glossaries which are mentioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. to avoid redundant definitions and possible inconsistencies, links must be implemented between ontologies developed from those glossaries that include common concepts. the ontology generation process presented in this paper is meant to allow for integration with other ontologies that will be developed in the future from the other glossaries. in addition to the explicit references to other terms within the glossary and to terms from other glossaries, the textual definition of a concept also has implicit references to other terms. for example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. these relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ ontology development process many ontology development methods presuppose a life cycle and suggest technologies to apply during the process of developing an ontology.16 the method described by noy and mcguinness is helpful when beginning this process for the first time.17 they establish a seven-step process: 1. determine the domain and scope of the ontology 2. consider reusing existing ontologies 3. enumerate important terms in the ontology figure 2. example of term definition in the ieee glossary figure 3. ieee computer science glossaries 610—standard dictionary of computer terminology 610.1—standard glossary of mathematics of computing terminology 610.2—standard glossary of computer applications terminology 610.3—standard glossary of modeling and simulation terminology 610.4—standard glossary of image processing terminology 610.5—standard glossary of data management terminology 610.6—standard glossary of computer graphics terminology 610.7—standard glossary of computer networking terminology 610.8—standard glossary of artificial intelligence terminology 610.9—standard glossary of computer security and privacy terminology 610.10—standard glossary of computer hardware terminology 610.11—standard glossary of theory of computation terminology 610.12—standard glossary of software engineering terminology 610.13—standard glossary of computer languages terminology high order language (hol). a programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program statement. examples include ada, cobol, fortran, algol, pascal. syn: high level language; higher order language; third generation language. contrast with: assembly language; fifth generation language; fourth generation language; machine language. note: specific languages are defined in p610.13 198 information technology and libraries | december 2010 since there are terms with different meanings (up to five in some cases) in the ieee glossary of software engineering terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. for example, there are five different definitions for the term test, which is why there are five concepts (test1–test5), corresponding to the five meanings of the term: (1) an activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) to conduct an activity as in (1); (3) a set of one or more test cases; (4) a set of one or more test procedures; (5) a set of one or more test cases and procedures. taxonomy the proposed lifecycle establishes a stage for the conversion of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in xml format; a taxonomy, which reflects the hierarchic relationships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the thesaurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 the following paragraphs describe the way each of these products is obtained. dictionary the first step of the proposed development process consists of the creation of a dictionary in xml format with all the terms included in the ieee standard glossary of software engineering terminology and their related definitions. this activity is particularly mechanical and does not need human intervention as it is basically a transformation of the glossary from its original format (pdf) into a format better suited to the development process. all formats considered for the dictionary are based on xml, and specifically on rdf and rdf schema. in the end, we decided to work with the standards daml+oil and owl,19 though we are not opposed to working with other languages, such as skos or xmi,20 in the future. (in the latter case, it would be possible to model the intermediate products and the ontology in uml graphic models stored in xml files.)21 in our project, the design and implementation of all products has been made using an ontology editor. we have used oiled (with oilviz plugin) as editor, both because of its simplicity and because it allows the exportation to owl and daml formats. however, with future maintenance and testing in mind, we decided to use protégé (with owl plugin) in the last step of the process, because this is a more flexible environment with extensible modules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. note that the dictionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. figure 4. ontology development process highorderlanguage figure 5. example of dictionary entry generating collaborative systems for digital libraries | hilera et al. 199 example, when analyzing the definition of the term compiler: “(is) a computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that compiler is a subconcept of computer program, which is also included in the glossary.) in addition to the lexical or syntactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the development of the taxonomy. the implementation of the hierarchical relationships among the concepts is made using rdfs:subclassof, regardless of whether the taxonomy is implemented in owl or daml format, since both languages specify this type of relationship in the same way. figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus according to the international organization for standardization (iso), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 this definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a thesaurus. the following is a sample of the lexical units: ■■ descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a concept that can be in documents or in queries to these documents. the iso standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. this note is identified by the abbreviation sn (scope note), as shown in figure 7. ■■ non-descriptors (“non-preferred terms”): the synonyms or quasi-synonyms of a preferred term. a nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropriate descriptor. usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesaurus descriptors into subsets called micro-thesauri. in addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. the more common relationships between lexical units are the following: ■■ equivalence: the relationship between the descriptors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. as gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 in addition, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 this situation is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabularies, taxonomies, and thesauri). this paper emphasizes the importance and usefulness of the intermediate products created at each stage of the evolutive process from glossary to ontology. the end product of the initial stage is a dictionary expressed as xml. the next stage in the evolutive process (figure 4) is the transformation of that dictionary into a taxonomy through the addition of hierarchical relationships between concepts. to do this, it is necessary to undertake a lexicalsemantic analysis of the original glossary. this can be done in a semiautomatic way by applying natural language processing (nlp) techniques, such as those recommended by morales-del-castillo et al.,24 for creating thesauri. the basic processing sequence in linguistic engineering comprises the following steps: (1) incorporate the original documents (in our case the dictionary obtained in the previous stage) into the information system; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate peripheral device—screen, speakers, printer, etc. an important aspect of this process is natural language comprehension. for that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent elements: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, organization and data structuring capabilities). from the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (for 200 information technology and libraries | december 2010 indicating that high order language relates to both assembly and machine languages. the life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. basically, we have to add two types of relationships—equivalence and associative, represented in the standard thesauri with uf (and use) and rt respectively. we will continue using xml to implement this new product. there are different ways of implementing a thesaurus using a language based on xml. for example, matthews et al. proposed a standard rdf format,26 where as hall created an ontology in daml.27 in both cases, the authors modeled the general structure of quasi-synonymous). iso establishes that the abbreviation uf (used for) precedes the nondescriptors linked to a descriptor; and the abbreviation use is used in the opposite case. for example, a thesaurus developed from the ieee glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ hierarchical: a relationship between two descriptors. in the thesaurus one of these descriptors has been defined as superior to the other one. there are no hierarchical relationships between nondescriptors, nor between nondescriptors and descriptors. a descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. according to the iso standard, hierarchy is expressed by means of the abbreviations bt (broader term), to indicate the generic or higher descriptors, and nt (narrower term), to indicate the specific or lower descriptors. the term at the head of the hierarchy to which a term belongs can be included, using the abbreviation tt (top term). figure 7 presents these hierarchical relationships. ■■ associative: a reciprocal relationship that is established between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. it is generally indicated by the abbreviation rt (related term). there are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. it is possible to establish associative relationships between descriptors belonging to the same or different category. the associative relationships can be of very different types. for example, they can represent causality, instrumentation, location, similarity, origin, action, etc. figure 7 shows two associative relations, .. high order language (descriptor) sn a programming language that... uf high level language (no-descriptor) uf third generation language (no-descriptor) tt language bt programming language nt object oriented language nt declarative language rt assembly language (contrast with) rt machine language (contrast with) .. high level language use high order language .. third generation language use high order language .. figure 7. fragment of a thesaurus entry figure 6. example of taxonomy entry ... generating collaborative systems for digital libraries | hilera et al. 201 terms. for example: . or using the glossary notation: . ■■ the rest of the associative relationships (rt) that were included in the thesaurus correspond to the cross-references of the type “contrast with” and “see also” that appear explicitly in the ieee glossary. ■■ neither compound descriptors nor groups of descriptors have been implemented because there is no such structure in the glossary. ontology ding and foo state that “ontology promotes standardization and reusability of information representation through identifying common and shared knowledge. ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 this semantic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. the final stage of the evolutive process is the transformation of the thesaurus created in the previous stage into an ontology. this is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge representation standards (such as dictionaries, taxonomies, and thesauri). for example: ■■ semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ axioms of classes and axioms of properties. these are restriction rules that are declared to be satisfied by elements of ontology. for example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relationships have been implemented as properties. software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. this software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:class or daml:class) and properties (rdf:property or daml:objectproperty). in the first case they proposed five classes: thesaurusobject, concept, topconcept, term, scopenote; and several properties to implement the relations, like hasscopenote (sn), isindicatedby, preferredterm, usedfor (uf), conceptrelation, broaderconcept (bt), narrowerconcept (nt), topofhierarchy (tt) and isrelatedto (rt). recently the w3c has developed the skos specification, created to define knowledge organization schemes. in the case of thesauri, skos includes specific tags, such as skos:concept, skos:scopenote (sn), skos:broader (bt), skos:narrower (nt), skos:related (rt), etc., that are equivalent to those listed in the previous paragraph. our specification does not make any statement about the formal relationship between the class of skos concept schemes and the class of owl ontologies, which will allow different design patterns to be explored for using skos in combination with owl. although any of the above-mentioned formats could be used to implement the thesaurus, given that the endproduct of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. that tool does not need to be modified during the following (final) phase of transformation into an ontology. nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as skos), it is possible to apply a simple xslt transformation to the product. another option would be to integrate a thesaurus ontology, such as the one proposed by hall,28 with the ontology representing the ieee glossary. in the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ only the hierarchical relationships implemented in the taxonomy have been considered. these include relationsips of type “is-a,” that is, generalization relationships or type–subset relationships. relationships that can be included in the thesaurus marked with tt, bt, and nt, like relations of type “part of” (that is, partative relationships) have not been considered. instead of considering them as hierarchical relationships, the final ontology includes the possibility of describing classes as a union of classes. ■■ the relationships of synonymy (uf and use) used to model the cross-references in the ieee glossary (“syn” and “see,” respectively) were implemented as equivalent terms, that is, as equivalent axioms between classes (owl:equivalentclass or daml:sameclassas), with inverse properties to reflect the preference of the 202 information technology and libraries | december 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). the isolated words will then be candidates for a relationship between both of them. (figure 8 shows the candidate properties obtained from the software engineering glossary.) the user then has the option of creating relationships with the identified candidate words. the user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quantification or cardinality (minimum or maximum). after confirming this information, the program updates the file containing the ontology (owl or daml), adding the property to the class that represents the processed term. figure 9 shows an example of the definition of two properties and its application to the class highorderlanguage: a property express with existential quantification over the class datastructure to indicate that a language must represent at least one data structure; and a property translateto of universal type to indicate that any high-level language is translated into machine language (machinelanguage). ■■ results, conclusions, and future work the existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the semantic web and in digital libraries, as well as the reuse of learning objects of the same domain stored in repositories available on the web.30 when a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. it will be necessary to assign its properties according to the concept definition included in the ontology. the user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the web. in that case, however, semantic search engines such as swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as google. the creation of a complete ontology of a knowledge domain is a complex task. in the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology creation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. this paper has described a process for developing a modest but complete ontology from a glossary of terminology, both in owl format and daml+oil format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes figure 8. candidate properties obtained from the linguistic analysis of the software engineering glossary generating collaborative systems for digital libraries | hilera et al. 203 to each term.) we defined 324 properties or relationships between these classes. these are based on a semiautomated linguistic analysis of the glossary content (for example, allow, convert, execute, operatewith, produces, translate, transform, utilize, workin, etc.), which will be refined in future versions. the authors’ aim is to use this ontology, which we have called ontoglose (ontology glossary software engineering), to unify the vocabulary. ontoglose will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engineering from the swebok guide.32 although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. the flexibility that owl permits for ontology description, along with its compatibility with other rdf-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. these representations may eventually be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ acknowledgments this research is co-funded by the spanish ministry of industry, tourism and commerce profit program (grant tsi-020100-2008-23). the authors also want to acknowledge support from the tifyc research group at the university of alcala. references and notes 1. m. dörr et al., state of the art in content standards (amsterdam: ontoweb consortium, 2001). 2. d. soergel, “the rise of ontologies or the reinvention of classification,” journal of the american society for information science 50, no. 12 (1999): 1119–20; a. gilchrist, “thesauri, taxonomies and ontologies—an etymological note,” journal of documentation 59, no. 1 (2003): 7–18. 3. b. j. wielinga et al., “from thesaurus to ontology,” proceedings of the 1st international conference on knowledge capture (new york: acm, 2001): 194–201: j. qin and s. paling, “converting a controlled vocabulary into an ontology: the case of gem,” information research 6 (2001): 2. 4. according to van heijst, schereiber, and wielinga, ontologies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowledge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. g. van heijst, a. t. which is ready to use in the semantic web. as described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. we have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. such improvements will increase the ontology’s utility, but will make it a lessfaithful representation of the ieee glossary from which it was derived. the ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the ieee glossary. (included in this number are the different meanings that the glossary assigns ... figure 9. example of ontology entry 204 information technology and libraries | december 2010 20. w3c, skos; object management group, xml metadata interchange (xmi), 2003, http://www.omg.org/technology/documents/formal/xmi.htm (accessed oct. 5, 2009). 21. uml (unified modeling language) is a standardized general-purpose modeling language (http://www.uml.org). nowadays, different uml plugins for ontologies’ editors exist. these plugins allow working with uml graphic models. also, it is possible to realize the uml models with a case tool, to export them to xml format, and to transform them to the ontology format (for example, owl) using a xslt sheet, as the one published in d. gasevic, “umltoowl: converter from uml to owl,” http://www.sfu.ca/~dgasevic/projects/umltoowl/ (accessed oct. 5, 2009). 22. gilchrist, “thesauri, taxonomies and ontologies.” 23. soergel, “the rise of ontologies or the reinvention of classification.” 24. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” information technology & libraries 28, no. 1 (2009): 22–31. 25. international standards organization, iso 2788:1986 documentation—guidelines for the establishment and development of monolingual thesauri (geneve: international standards organization, 1986). 26. b. m. matthews, k. miller, and m. d. wilson, “a thesaurus interchange format in rdf,” 2002, http://www.w3c.rl.ac .uk/swad/thes_links.htm (accessed feb. 10, 2009). 27. m. hall, “call thesaurus ontology in daml,” dynamics research corporation, 2001, http://orlando.drc.com/daml/ ontology/call-thesaurus (accessed oct. 5, 2009). 28. ibid. 29. y. ding and s. foo, “ontology research and development. part 1—a review of ontology generation,” journal of information science 28, no. 2 (2002): 123–36. see also b. h. kwasnik, “the role of classification in knowledge representation and discover,” library trends 48 (1999): 22–47. 30. s. otón et al., “service oriented architecture for the implementation of distributed repositories of learning objects,” international journal of innovative computing, information & control (2010), forthcoming. 31. o. mendes and a. abran, “software engineering ontology: a development methodology,” metrics news 9 (2004): 68–76; c. calero, f. ruiz, and m. piattini, ontologies for software engineering and software technology (berlin: springer, 2006). 32. ieee, guide to the software engineering body of knowledge (swebok) (los alamitos, calif.: ieee computer society, 2004), http:// www.swebok.org (accessed oct. 5, 2009). schereiber, and b. j. wielinga, “using explicit ontologies in kbs development,” international journal of human & computer studies 46, no. 2/3 (1996): 183–292. 5. r. neches et al., “enabling technology for knowledge sharing,” ai magazine 12, no. 3 (1991): 36–56. 6. o. corcho, f. fernández-lópez, and a. gómez-pérez, “methodologies, tools and languages for buildings ontologies. where is their meeting point?” data & knowledge engineering 46, no. 1 (2003): 41–64. 7. intitute of electrical and electronics engineers (ieee), ieee std 610.12-1990(r2002): ieee standard glossary of software engineering terminology (reaffirmed 2002) (new york: ieee, 2002). 8. j. krause, “semantic heterogeneity: comparing new semantic web approaches with those of digital libraries,” library review 57, no. 3 (2008): 235–48. 9. t. berners-lee, j. hendler, and o. lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43. 10. world wide web consortium (w3c), resource description framework (rdf): concepts and abstract syntax, w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-concepts/ (accessed oct. 5, 2009). 11. world wide web consortium (w3c), web ontology language (owl), 2004, http://www.w3.org/2004/owl (accessed oct. 5, 2009). 12. world wide web consortium (w3c), skos simple knowledge organization system, 2009, http://www.w3.org/ tr/2009/rec-skos-reference-20090818/ (accessed oct. 5, 2009). 13. m. m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (2009): 55-80. 14. l. f. spiteri, “the structure and form of folksonomy tags: the road to the public library catalog,” information technology & libraries 26, no. 3 (2007): 13–25. 15. corcho, fernández-lópez, and gómez-pérez, “methodologies, tools and languages for buildings ontologies.” 16. ieee, ieee std 610.12-1990(r2002). 17. n. f. noy and d. l. mcguinness, “ontology development 101: a guide to creating your first ontology,” 2001, stanford university, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed sept 10, 2010). 18. d. baader et al., the description logic handbook (cambridge: cambridge univ. pr., 2003). 19. world wide web consortium, daml+oil reference description, 2001, http://www.w3.org/tr/daml+oil-reference (accessed oct. 5, 2009); w3c, owl. 34 information technology and libraries | march 2011 camilla fulton web accessibility, libraries, and the law as a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. everything on the page flows well for you and the content is broken up easily for navigation. now imagine that you are legally blind. you navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. when you visit your teacher’s webpage, however, you start experiencing some problems. for one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (css). most screen readers use heading tags to create the equivalent of a table of contents. this table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. second, most screen readers also allow users to “scan” or navigate a page by its listed links. when you visit your teacher’s page, you get a list of approximately twenty links that all read, “search this resource.” unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all content for the appropriate context. third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. if the resources were contained within the proper html list tags of either ordered or unordered (with subsequent list item tagging), then you could navigate through the suggested resources more efficiently (see figures 1, 2, and 3). finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. without a visual transcript, you are at a disadvantage. stylistic descriptions of the page and its buttons are generally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. to be fair, your teacher would already be cognizant of your visual disability and would have accommodated your class needs appropriately. the individuals with disabilities education act (idea) mandates educational institutions to provide an equal opportunity to education.1 your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. unlike educational institutions, however, most libraries are not legally bound to the same law. idea does not command libraries to provide equal access to information through with an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. this paper discusses web accessibility in the context of united states’ federal laws most referenced in web accessibility lawsuits. additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference section 508 of the rehabilitation act or web content accessibility guidelines (wcag) 1.0. regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. i magine you are a student. in one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. this quiz constitutes 20 percent of your final grade. through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. the teacher and librarian divide their hand-picked resources into five subject-based categories. each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. the list concludes with a short video tutorial that prepares students for the layout of the online quiz. neither the teacher nor the librarian has extensive web design experience, but they both have basic html skills. the library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. unfortunately, they do not have a web librarian at their disposal to help construct the page. they solely rely on what they recall from previous web projects and visual layouts from other websites they admire. as they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. they then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). next, they place links to the resources within the description text and label them with “search this resource.” finally, they create the audiovisual tutorial with a runtime of three minutes. camilla fulton (cfulton2@illinois.edu) is web and digital content access librarian, university of illinois, urbana-champaign. web accessibility, libraries, and the law | fulton 35 providing specifics on when those standards should apply. for example, section 508 of the rehabilitation act could serve as a blueprint for information technology guidelines that state agencies should follow. section 508 states that federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by federal employees who are not individuals with disabilities, unless an undue burden would be imposed on the agency.4 section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. section 508’s web standards comply with w3c’s web content accessibility guidelines (wcag) 1.0; stricter compliance is optional. states could stop at section 508 and only make web accessibility laws applicable to other state agencies. section 504 of the rehabilitation act, however, provides additional legislation to model. in section 504, no disabled person can be excluded from programs or activities that are funded by federal dollars.5 section 504 further their websites. neither does the federal government possess a carte blanche web accessibility law that applies to the nation. this absence of legislation may give the impression of irrelevance, but as more core components of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. this article provides background information on the federal laws most frequently referenced within web accessibility cases. additionally, this article tests three assumptions: ■■ although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ most state statutes do not mention section 508 of the americans with disabilities act (ada) or acknowledge world wide web consortium (w3c) standards. ■■ most libraries are not included as entities that must comply with state web accessibility statutes. further discussion on why these issues are important to the library profession follows. ■■ literature review no previous study has systematically examined state web accessibility statutes as they relate to libraries. most articles that address issues related to library web accessibility view libraries as independent entities and run accessibility evaluators on preselected library and university websites.2 those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 in examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ background with no definitive stance on public web accessibility from the federal government, states became tasked with figure 1. these webpages look exactly the same to users, but the html structure actually differs in source code view. 36 information technology and libraries | march 2011 title ii, section 201 (1) defines “public entity” as state and local governments, including their agencies, departments, and districts.9 title iii, section 302(a) builds on title ii and states that in the case of commercial facilities, no individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommodation by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. though section 504 never mentions web accessibility specifically, states could freely interpret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). if states wanted to provide the highest level of service to all, they would also consider incorporating the most recent w3c recommendations. the w3c formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. the driving principle of the w3c is to make the benefits of the web accessible to all, “whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability.”6 the most recent w3c guidelines, wcag 2.0, detail web accessibility guidelines that are simpler to understand and, if followed, could improve both accessibility and usability despite browser type. alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. the national federation of the blind (nfb) and american council of the blind (acb) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. the famous nfb lawsuit against target provided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 these commercial web accessibility lawsuits are often defended with title ii and title iii of the ada. title ii, section 202 states, subject to the provisions of this title, no qualified individual with a disability shall, by reason of such disability, be excluded from participation in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 figure 2. here we see distinct variances in the source code. the image at the top (inaccessible) reveals code that does not use headings or unordered lists for each resource. the image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. web accessibility, libraries, and the law | fulton 37 accessibility believe that section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommodation” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 despite legal battles within the commercial sector, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ method this study surveys the most current state statute web presences as they pertain to web accessibility and their connection to libraries. using georgia institute of technology’s state e&it accessibility initiatives database and golden’s article on accessibility within institutions of higher learning as starting points, i searched each state government’s online statutes for the most recently available code.15 examples of search terms used include “web accessibility,” “information technology,” and “accessibility -building -architecture -health.” “building,” for example, excluded statute results that pertained to building accessibility. i then reviewed each statute to determine whether its mandates applied to web accessibility. some statutes excluded mention of web accessibility but outlined specific requirements for an institution’s software procurement. when statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or standards. using a popular web search engine and the search terms “[state] web accessibility” usually resulted in finding the state’s standards online. if the search engine did not offer desirable results, then i visited the appropriate state government’s website. the term “web accessibility” was used within the state government’s site search. the following results serve only as a guide. because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ results “although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” false—only seventeen states have codified laws ensuring web accessibility for their state websites.16 four this title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. title iii, section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 among those listed are auditoriums, theaters, terminals, and educational facilities. courts using title iii in defense for web accessibility argue that the web is a place, and therefore cannot discriminate against those with visual, motor, or mental disabilities.12 those arguing against using title iii for web figure 3. fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 information technology and libraries | march 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ada for web accessibility.24 some may not recognize the significance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. coincidentally, perceived societal drawbacks could keep disabled users from seeking the assistance they need.25 according to american community survey terminology, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 the 2008 american community survey public use microdata sample estimates that 10,393,100 noninstitutionalized americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 according to the same survey, an estimated 7,195,600 noninstitutionalized americans live with a self-care disability. in other words, nearly 24.5 million people in the united states are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. as gatekeepers of information and research resources, librarians should want to be the first to provide unrestricted and unhindered access to all patrons despite their ability. nonetheless, potential objections to addressing web accessibility can deter improvement: learning and applying web accessibility guidelines will be difficult. there is no way we can improve access to disabled users in a way that will be useful. actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in section 508, such as utilizing headings properly, giving alternative image descriptions, and providing captions for audio and video. granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 creating an accessible website is time consuming and resource draining. this is obviously an “undue burden” on our facility. we cannot do anything about accessibility until we are given more funding. the “undue burden” clause seen in section 508 and several state statutes is a real issue that government officials needed to address. however, individual institutions are not supposed to view accessible website creation as an isolated activity. “undue burden,” as defined by the code of federal regulations, relies upon the overall budget of the program or component being developed.29 claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 the institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agencies receiving state funds (with no exceptions).17 though that number seems disappointingly low, many states addressed web accessibility through other means. thirtyone states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). these standards only apply to state entities, however, and have no legal footing outside of federal law to spur enforcement. at the time of article submission, alaska and wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “most state statutes do not mention section 508 of the americans with disabilities act or acknowledge world wide web consortium (w3c) standards” true—interestingly, only seven of the seventeen states with web accessibility statutes reference section 508 or wcag 1.0 directly within their statute text (see appendix).18 minnesota is the only state that references the more current wcag 2.0 standards.19 these numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appendix). within those guidelines and standards, section 508 and wcag 1.0 get mentioned with more frequency. “most libraries are not included as entities that must comply with state web accessibility statutes.” true—from the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). four of those states (arkansas, california, kentucky, and montana) require all libraries receiving state funds to maintain an accessible website.20 an additional four states (illinois, oklahoma, texas, and virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of k–12, public, and academic libraries nationwide escape these laws’ reach. ■■ discussion and conclusion without legal backing for web accessibility issues at all levels, “equitable access to information and library services” might remain a dream.22 notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ala-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 additionally, a survey of carnegie web accessibility, libraries, and the law | fulton 39 9. 42 u.s.c. §12131. 10. 42 u.s.c. §12182. 11. 42 u.s.c. §12181. 12. carrie l. kiedrowski, “the applicability of the ada to private internet web sites,” cleveland state law review 49 (2001): 719–47; shani else, “courts must welcome the reality of the modern word: cyberspace is a place under title iii of the americans with disabilities act,” washington & lee law review 65 (summer 2008): 1121–58. 13. ibid. 14. nikki d. kessling, “why the target ‘nexus test’ leaves disabled americans disconnected: a better approach to determine whether private commercial websites are ‘places of public accommodation,’” houston law review 45 (summer 2008): 991–1029. 15. state e & it accessibility initiatives workgroup, “state it database,” georgia institute of technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed jan. 28, 2010); nina golden, “why institutions of higher education must provide access to the internet to students with disabilities,” vanderbilt journal of entertainment & technology law 10 (winter 2008): 363–411. 16. arizona revised statutes §41-3532 (2010); arkansas code of 1987 annotated §25-26-201–§25-26-206 (2009); california government code §11135–§11139 (2010); colorado revised statutes §24-85-101–§24-85-104 (2009); florida statutes §282.601– §282.606 (2010); 30 illinois complied statutes annotated 587 (2010); burns indiana code annotated §4-13.1-3 (2010); kentucky revised statutes annotated §61.980–§ 61.988 (2010); louisiana revised statutes §39:302 (2010); maryland state finance and procurement code annotated §3a-311 (2010); minnesota annotated statutes §16e.03 subdivisions 9-10 (2009); missouri revised statutes §191.863 (2009); montana code annotated §185-601 (2009); 62 oklahoma statutes §34.16, §34.28–§34.30 (2009); texas government code §2054.451–§2054.463 (2009); virginia code annotated §2.2-3500–§2.2-3504 (2010); west virginia code § 18-10n-1–§18-10n-4 (2009). 17. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 18. arizona revised statutes §41-3532 (2010); california government code §11135(d)(2) (2010); burns indiana code annotated §4-13.1-3-1(a) (2010); florida statutes §282.602 (2010); kentucky revised statutes annotated §61.980(1) (2010); minnesota annotated statutes §16e.03 subdivision 9(b) (2009); missouri revised statutes §191.863(1) (2009). 19. minnesota annotated statutes §16e.03 subdivision 9(b) (2009). 20. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 21. 30 illinois complied statutes annotated 587/10 (2010); 62 oklahoma statutes §34.29 (2009); texas government code §2054.451 (2009); virginia code annotated §2.2-3501 (2010). 22. american library association, “alahead to 2010 strategic plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed jan. 28, 2010). 23. comeaux and schmetzke, “accessibility trends.” no one will sue an institution focused on promoting education. we will just continue providing one-on-one assistance when requested. in 2009, a blind student, backed by the nfb, initiated litigation against the law school admissions council (lsac) because of the inaccessibility of its online tests.31 in 2010, they added four law schools to the defense: university of california hastings college of the law, thomas jefferson school of law, whittier law school, and chapman university school of law.32 these law schools were added because they host their application materials on the lsac website.33 assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. ultimately, providing accessible websites for library users should not be perceived as a hassle. sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustration that users may feel when they cannot be self-sufficient in their web-based research.34 regardless of whether the disabled user is in a k–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 federal agencies, state entities, and individual institutions are all responsible (and important) in the promotion of accessible website construction. lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. references 1. individuals with disabilities education act of 2004, 40 u.s.c. §1411–§1419. 2. see david comeaux and axel schmetzke, “accessibility trends among academic library and library school web sites in the usa and canada,” journal of access services 6 (jan.–june 2009): 137–52; julia huprich and ravonne green, “assessing the library homepages of copla institutions for section 508 accessibility errors: who’s accessible, who’s not and how the online webxact assessment tool can help,” journal of access services 4, no. 1 (2007): 59–73; michael providenti and robert zai iii, “web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93. 3. ibid.; michael providenti and rober zai iii, “web accessibility at academic libraries: standards, legislation, and enforcement,” library hi tech 24, no. 4 (2007): 494–508. 4. 29 u.s.c. §794(d); 36 code of federal regulations (cfr) §1194.1. 5. 29 u.s.c. § 794. 6. world wide web consortium, “w3c mission,” http:// www.w3.org/consortium/mission.html (accessed jan. 28, 2010). 7. national federation of the blind v. target corp., 452 f. supp. 2d 946 (n.d. cal. 2006). 8. 42 u.s.c. §12132. 40 information technology and libraries | march 2011 special needs, vol. 5105, lecture notes in computer science (linz, australia: springer-verlag, 2008) 454–61; david kane and nora hegarty, “new site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87. 29. 28 cfr §36.104. 30. ibid. 31. sheri qualters, “blind law student sues law school admissions council over accessibility,” national law journal (feb. 20, 2009), http://www.law.com/jsp/nlj/pubarticlenlj .jsp?id=1202428419045 (accessed jan. 28, 2010). follow the case at the county of alameda’s superior court of california, available online (search for case number rg09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed sept. 20, 2010). 32. ibid. 33. ibid. after finding the case, click on “register of actions” in the side navigation menu. these details can be found on page 10 of the action “joint case management statement filed,” uploaded june 30, 2010. 34. jim blansett, “digital discrimination: ten years after section 508, libraries still fall short of addressing disabilities online,” library journal 133 (aug. 2008): 26–29; drew robb, “one site fits all: companies are working to make their web sites comply with accessibility guidelines because the effort translates into more customers,” computerworld (mar. 28, 2005): 29–32. 35. the united states department of justice supports title iii’s application of “public accommodation” to include virtual web spaces. see u.s. department of justice, “settlement agreement between the united states of america and city of missoula county, montana under the americans with disabilities act,” dj# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed jan. 28, 2010). 24. ruth sara connell, “survey of web developers in academic libraries,” journal of academic librarianship 34, no. 2 (2008): 121–29. 25. patrick m. egan and traci a. guiliano, “unaccommodating attitudes: perceptions of students as a function of academic accommodation use and test performance” north american journal of psychology 11, no. 3 (2009): 487–500; ramona paetzold et al., “perceptions of people with disabilities: when is accommodation fair?” basic & applied social psychology 30 (2008): 27–35. 26. u.s. census bureau, american community survey, puerto rico community survey: 2008 subject definitions (washington, d.c.: government printing office, 2009). hearing disability pertains to deafness or difficulty in hearing. visual disability pertains to blindness or difficulty seeing despite prescription glasses. self-care disability pertains to those whom have “difficulty dressing or bathing.” 27. u.s. census bureau, data set: 2006–2008 american community survey (acs) public use microdata sample (pums) 3-year estimates (washington, d.c.: government printing office, 2009). for a more interactive table, with statistics drawn directly from the american community survey pums data files, see the database created and maintained by the employment and disability institute at cornell university: m. j. bjelland, w. a. erickson, and c. g. lee, disability statistics from the american community survey (acs), cornell university rehabilitation research and training center on disability demographics and statistics (statsrrtc), http://www.disabilitystatistics.org (accessed jan. 28, 2010). 28. sébastien rainville-pitt and jean-marie d’amour, “using a cms to create fully accessible web sites,” journal of access services 6 (2009): 261–64; laura burzagli et al., “using web content management systems for accessibility: the experience of a research institute portal,” in proceedings of the 11th international conference on computers helping people with appendix. library website accessibility requirements, by state state libraries included? code online state statutes online statements/policies/ guidelines ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx alas. n/a n/a n/a n/a ariz.* state and statefunded (with exceptions) arizona revised statutes §413532 http://www.azleg.state.az.us/ arizonarevisedstatutes.asp? title=41 http://az.gov/polices_accessibility.html ark. state and state-funded arkansas code annotated §2526-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ arkansascodelargefiles/title%2025%20 state%20government-chapter%2026%20 information%20technology.htm and http:// www.arkleg.state.ar.us/bureau/publications/ arkansas%20code/title%2025.pdf http://portal.arkansas.gov/pages/policy .aspx web accessibility, libraries, and the law | fulton 41 state libraries included? code online state statutes online statements/policies/ guidelines calif.* state and state-funded california government code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/accessibility/ state_standards.asp colo. state colorado revised statutes §2485-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ olls/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html conn. n/a n/a n/a http://www.access.state.ct.us/ del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml fla.* state florida statutes §282.601 thru §282.606 http://www.leg.state.fl.us/statutes/ http://www.myflorida.com/myflorida/ accessibility.html ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_accessibility, 00.html hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html idaho n/a n/a n/a http://idaho.gov/accessibility.html ill. state and university 30 illinois complied statutes annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 ind.* state and local government burns indiana code annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html ky.* state and state-funded kentucky revised statutes annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm la. state louisiana revised statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/government/ policies/#webaccessibility maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html appendix. library website accessibility requirements, by state (continued) 42 information technology and libraries | march 2011 state libraries included? code online state statutes online statements/policies/ guidelines md. state and (possibly) community college maryland state finance and procurement code annotated §3a311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ accessibility.aspx mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageid=mg2utiliti es&l=1&sid=massgov2&u=utility_policy_ accessibility mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html minn.** state minnesota annotated statutes §16e. 03 subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ accessibility_usability.htm miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp mo.* state missouri revised statutes §191.863 http://www.moga.mo.gov/statutes/ statutes.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm mont. state and state-funded montana code annotated §185-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html nev. n/a n/a n/a http://www.nitoc.nv.gov/psps/3.02_ standard_webstyleguide.pdf n.h. n/a n/a n/a http://www.nh.gov/wai/ n.j. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html n.m. n/a n/a n/a http://www.newmexico.gov/accessibility .htm n.y. n/a n/a n/a http://www.cio.ny.gov/policy/nys-p08– 005.pdf n.c. n/a n/a n/a http://www.ncsta.gov/docs/principles%20 practices%20standards/application.pdf n. dak. n/a n/a n/a http://www.nd.gov/ea/standards/ ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ appendix. library website accessibility requirements, by state (continued) web accessibility, libraries, and the law | fulton 43 state libraries included? code online state statutes online statements/policies/ guidelines okla. state and university 62 oklahoma statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 r.i. n/a n/a n/a http://www.ri.gov/policies/access.php s.c. n/a n/a n/a http://sc.gov/policies/accessibility.htm s. dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html tex. state and university texas government code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies utah n/a n/a n/a http://www.utah.gov/accessibility.html va. state, university, and commonwealth virginia code annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx w. va. state west virginia code §1810n-1 thru §18-10n-4 http://www.legis.state.wv.us/wvcode/ code.cfm http://www.wv.gov/policies/pages/ accessibility.aspx wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html wyo. n/a n/a n/a n/a *these states mention section 508 of the rehabilitation act within statute text **this state mentions wcag 2.0 within its statute text note: most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to section 508 and wcag 2.0. all webpages were visited between january 1, 2010, and february 12, 2010. appendix. library website accessibility requirements, by state (continued) editorial board thoughts drained-pool politics versus digital libraries in u.s. cyberspace mary a. guillory, mlis information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16988 about the author mary a. guillory (corresponding author: https://www.linkedin.com/in/maryaguillory/) is a member of the ital editorial board © 2023. opinions expressed in this column are the author’s and do not necessarily reflect those of the editorial board as a whole or of core, a division of ala. u.s. libraries in cyberspace are suffering from and combatting a series of actions and campaigns that aim to eliminate them rather than allocate mental bandwidth for diverse titles—in other words, digital libraries are dealing with modern-day drained-pool politics. as public, school, and university libraries become increasingly digital, the ability to simply switch off the entire library becomes more of a threat. with the internet being both literacy’s greatest enemy (it represents a single point of failure) and ally (it has been almost universally adopted), volume of access, money, and grey area legislation often play a large role in intellectual freedom. book banning—a naturally polarizing issue—has not yet found balance when it comes to books, magazines, audiobooks, movies, and music within digital libraries. though the capability to ban one book exists as it does in the physical world, access to digital libraries has become all-or-nothing in many instances around the nation. there is no consistency of approach across states, municipalities, or even school districts, with federal law applying only on a case-by-case basis. for example, brevard public schools in florida caused a stir last year when access to epic, the digital library software students had become accustomed to for leisurely family reading time, mysteriously disappeared.1 the school district cited an inability to comply with a new state law that requires all instructional material available in the digital library to be reviewed. while this loss of access primarily affected students under the age of 12 attending schools in the district, digital library bans have a way of expanding into the adult age group and the general public who may be receiving an entirely different type of education. in july 2023, mississippi state law essentially barred everyone under the age of 18 from the hoopla and overdrive digital libraries.2 in texas, patrons of the llano county library system—adults included—celebrated a small victory by having a federal judge restore some digital library access and the local government decide against the retaliatory closure of the libraries who filed the suit.3 when it comes to mass media, the category digital libraries belong to, there is a fine line between telling the stories of the people who make the world go round and promoting an idea or way of life. the realities of society can’t be treated as elephants in a room, when they are in fact the shared experiences of communities hiding in plain sight to avoid ostracism. that is, these are not stories that everyone knows about and refuses to acknowledge, but stories that people who most need to make a connection or understand a different viewpoint may never know exist. as the world depicted in fahrenheit 451 becomes closer to reality, there are both current tools in place and rebalancing efforts emerging. in september of this year, california took the opposite approach of texas and florida when its governor banned book bans in schools with state law.4 the bill was signed just months after https://www.linkedin.com/in/maryaguillory/ information technology and libraries december 2023 editorial board thoughts 2 guillory access to the sora digital library was taken away from orange unified school district students and families.5 many public libraries in the state of california already operate under a “universal borrower” policy that allows state residents to obtain a library card at any library, which provides access to a multitude of digital library materials. in georgia, a similar budding program exists. public libraries participating in the pines program allow residents to obtain cards at any participating library within the state. these preexisting tools facilitating access help to keep the intellectual freedom scale stable while newer programs hope to rebalance a tilting scale. some libraries and literacy organizations are taking a direct and national stand against book bans by opening their digital libraries nationwide to provide access to those who should be intellectually free. the brooklyn public library launched its books unbanned program to provide digital library access nation-wide to teens and young adults within the 13–21 age range.6 the program has grown with the boston public library, la county public library, san diego public library, and seattle public library all offering a variation. the digital public library of america’s banned book club created an entire digital library consisting of books that had been banned “somewhere.”7 broader legislation is needed to protect access to the truth, access to reality, and access to viewpoints sidelined by aggressive idealism. digital libraries are a melting pot for all the diverse people and experiences that make up the world. a story that is hard to hear is not a story that is wrong to listen to; writers are supposed to elicit emotion to help readers spend some time in someone else’s shoes. to learn more about this multifaceted issue, visit the american library association’s intellectual freedom advocacy webpage: https://www.ala.org/advocacy/intfreedom. endnotes 1 bailey gallion, “brevard public schools cancels free online library, math game to comply with new state law,” florida today, may 4, 2022, https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schoolsremoves-access-epic-prodigy-florida-parental-rights-education-law/9629061002/. 2 kelly jensen, “hoopla, overdrive/libby now banned for those under 18 in mississippi,” book riot, july 7, 2023, https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under18-in-mississippi/. 3 andrew albanese, “judge finds texas library's book bans unconstitutional, orders books returned,” publisher’s weekly, april 3, 2023, www.publishersweekly.com/pw/bytopic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bansunconstitutional-orders-books-returned.html; william melhado, “llano county library supporters declare victory as officials decide not to close all branches,” the texas tribune, april 13, 2023, https://www.texastribune.org/2023/04/13/llano-county-library-books/. 4 johnathan franklin, “new california law bars schoolbook bans based on racial and lgbtq topics,” npr, september 6, 2023, https://www.npr.org/2023/09/26/1201804972/californiagov-newsom-barring-book-bans-race-lgbtq. 5 jill replogle and michael flores, “a parent complained about a digital book. then an orange county school board suspended the whole library,” laist, february 3, 2023, https://www.ala.org/advocacy/intfreedom https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schools-removes-access-epic-prodigy-florida-parental-rights-education-law/9629061002/ https://www.floridatoday.com/story/news/education/2022/05/04/brevard-public-schools-removes-access-epic-prodigy-florida-parental-rights-education-law/9629061002/ https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under-18-in-mississippi/ https://bookriot.com/hoopla-overdrive-libby-now-banned-for-those-under-18-in-mississippi/ http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html http://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/91903-judge-finds-texas-library-s-book-bans-unconstitutional-orders-books-returned.html https://www.texastribune.org/2023/04/13/llano-county-library-books/ https://www.npr.org/2023/09/26/1201804972/california-gov-newsom-barring-book-bans-race-lgbtq https://www.npr.org/2023/09/26/1201804972/california-gov-newsom-barring-book-bans-race-lgbtq information technology and libraries december 2023 editorial board thoughts 3 guillory https://laist.com/news/education/school-district-book-banning-censorship-appconservatives-orange-unifed. 6 “books unbanned,” brooklyn public library, accessed november 10, 2023, https://www.bklynlibrary.org/books-unbanned. 7 christopher parker, “readers can now access books banned in their area for free with new app,” smithsonian magazine, july 25, 2023, https://www.smithsonianmag.com/smartnews/banned-book-club-app-180982592/. https://laist.com/news/education/school-district-book-banning-censorship-app-conservatives-orange-unifed https://laist.com/news/education/school-district-book-banning-censorship-app-conservatives-orange-unifed https://www.bklynlibrary.org/books-unbanned https://www.smithsonianmag.com/smart-news/banned-book-club-app-180982592/ https://www.smithsonianmag.com/smart-news/banned-book-club-app-180982592/ the next generation integrated library system: a promise fulfilled? yongming wang and trevor a. dawes information technology and libraries | september 2012 76 abstract the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources that they were not designed to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation library system. the authors also examine some of the next-generation library systems currently in development that purport to fill the changing needs of libraries. introduction since the late 1980s and early 1990s, the library automation system has gone from inception to rapid implementation to near ubiquitous adoption. but after two decades of changes in information technology, and especially in the last decade, the library has seen itself facing tremendous changes in terms of both resources and services it provides. on the resource side, print material and physical items are no longer dominant collections; electronic resources are fast outpacing physical materials to become the dominant library resources, especially in the academic and special libraries. in addition, many other digital format resources, such as digital collections, institutional repositories, and e-books have taken root. on the service front, library users— accustomed to immediate and instant searching, finding, and accessing information in the google age—demand more and more instant and easy access to library resources and services. but the library automation system, also called the integrated library system (ils), has not changed much for the past two decades. it finds itself uneasily handling the ever-changing library environment and workflow. library staff becomes ever more frustrated with the ils, noting its inadequacy in dealing with their daily jobs. library users are confused by the many interfaces and complexity of library applications and systems. it is obvious that we are at the tipping point for a dramatic change in the area of library automation systems. the library literature has been referring to these as second-generation library automation systems or next-generation library systems.1 two pillars of the second-generation library automation system are(1) it will manage the library resources in the comprehensive and unified way regardless of resource format and location; and (2) it will break away from the traditional ils models and build on the service oriented architecture (soa) model. yongming want (wangyo@tcnj.edu) is systems librarian for the college of new jersey library, ewing township, and trevor dawes (tdawes@princeton.edu) is access services & circulation librarian, princeton university libraries, princeton, new jersey. the next generation library system: a promise fulfilled? | wang and dawes 77 we are at the beginning of a new era of library automation systems. some library system vendors have realized the need to change and have started to develop and implement the secondgeneration library automation system. we believe that the concept and implementation of the new library automation system will catch on quickly among the all types of libraries. it will change how the library conducts its business and will benefit both library staff and users. literature review there is not much research literature on the subject to date. after more than a decade of library automation development and implementation, starting in the late 1990s, libraries have been facing the challenges ushered in by rapidly evolving internet and web 2.0 technologies in addition to the growing number of savvy web users. libraries found themselves lagging behind other sources (such as internet search engines) in meeting users’ information needs, and library staff members are generally frustrated by the lack of flexibility of traditional library systems. as early as 2007, marshall breeding pointed out that “as librarians continue to operate with sparse resources, performing ever more services with ever more diverse collections—but with no increases in staff—it’s more important than ever to have automation tools that provide the most effective assistance possible.”2 in his 2009 article, he deliberately says that “dissatisfaction with the current slate of ils products runs high. the areas of concern lie in their inability to manage electronic content and with their user interfaces that do not fare well against contemporary expectations of the web.”3 so what are the trends in libraries for the last decade in terms of library resources, collections, services, and resource discoveries? according to breeding, there are three trends: “1. increased digital collections; 2. changed expectations regarding interfaces; 3. shifted attitudes toward data and software.”4 andrew pace notes that “web-based content, licensed resources, born-digital documents, and institutionally significant digital collections emerged rapidly to overtake the effort required to maintain print collections, especially in academic libraries.”5 another noticeable trend in the library technology field is occurring along with a similar trend in the general information technology field, that is, the open-source software movement. as pace states, “open source software (oss) efforts such as the open archive initiative (oai), dspace, and koha—just to name a few, as an exhaustive list would overwhelm the reader—challenged commercial proprietary systems, not only for market share but often in terms of sophistication and functionality.”6 as for the infrastructure and features of the second-generation library automation system, both breeding and pace have their respective visions. breeding writes that “the next generation of library automation systems needs to be designed to match the workflows of today’s libraries, which manage both digital and print resources.”7 “one of the fundamental assumptions of the next generation library automation would involve a design to accommodate the hybrid physical and digital existence that libraries face today.”8 pace specifically requires that the next-generation library automation system should use the web as a platform to fulfill the notion of software-as-aservice (saas), or further, platform-as-a-service (paas). the technical advantages of such systems would include the ability to “1. develop, test, deploy, host, and maintain on the same integrated environment; 2. user experience without compromise; 3. build-in scalability, reliability, and information technology and libraries | september 2012 78 security; 4. build-in integration with web services and databases; 5. support collaboration; 6. deep application instrumentation.”9 also as early as october 2007, computers in libraries invited ellen bahr to survey a number of library technology experts regarding what features and functionality they want to see built into ilss soon. the experts included roy tennant, kristin antelman, ross singer, andrew pace, john blyberg, stephen abram, and h. frank cervone. they identified the following key functionality for future ilss: • direct, read-only access to data, preferably through an open source database management system like mysql. • a standard way to communicate with the ils, preferably through an application programming interface. • standards-compliant systems including better security and more complete documentation. • the ability to run the ils on hardware that the library selects and on servers that the library administers. • greater interoperability of systems, pertaining to the systems within the library (including components from vendors, open source communities, and homegrown systems) and beyond (enterprise-level systems such as courseware and university portals, and shared library systems such as oclc). • greater distinction between the ils (which needs to efficiently manage a library’s business processes) and the opac (which needs to be a sophisticated finding tool). • better user interfaces, making use of the most current technologies available and providing a single interface to all of the library’s holdings, regardless of format.10 four aspects of next-generation ils there are four distinguishing characteristics of the next-generation ils we believe are critical. they are comprehensive library resources management; a system based on service-oriented architecture; the ability to meet the challenge of new library workflow; and a next-generation discovery layer. comprehensive library resources management comprehensive library resources management requires that next-generation ilss should be able to manage all library materials regardless of format or location. current ilss are built around the traditional library practice of print collections and services designed around these collections, but the last ten to fifteen years have seen great shifts in both library collections and services. print and physical materials are no longer the dominant resources. actually, in many libraries, especially in academic and research libraries, the building of electronic and digital collections have taken a larger role in library collection development. the traditional ils has not been able to handle ever-growing electronic and digital resources—either in terms of their acquisition or management. therefore a variety of either commercial or open-source the next generation library system: a promise fulfilled? | wang and dawes 79 electronic resources management systems (erm systems) have been developed over the years to address this management gap, but two problems exist: first, most erm systems, whether commercial or open-source, have not been able to truly integrate the acquisition process into the acquisitions workflow of the current ils systems, causing a messy and redundant workflow for the library staff. in libraries where an erm is deployed, staff generally track workflows in both the erm and the ils. if the library’s workflows have not been revised, miscommunication between the traditional acquisitions staff and the electronic resources staff can cause confusion, delay, and may even lead to disruption of services to library patrons. second, erm systems, by design, don’t take current library workflows into account. while it is true that these resources may need to be processed differently, library staff generally are used to traditional processes and want systems that function in familiar ways. many libraries, particularly academic libraries, still have relatively large serials departments responsible for the management of print journals. some have only recently begun to develop the personnel and the skills required to manage the influx of electronic and digital resources. because of these problems with existing erm systems, it is important that the next-generation ilss fully integrate the key features of erm systems, enabling the library to streamline and efficiently manage resources and staff. full integration of e-resource management would not only include acquisitions functionality but also the ability to manage licenses—a critical component of e-resource management—and the ability to manage the various packages, databases, and vendors. describing and providing access to e-resources are two aspects of the e-resources management process. these two features of the erm system should also be integrated with the description and metadata management component of the next-generation ils. centrally managing the metadata of e-resources enables easier discovery of resources by library users and has the advantage of shifting some of the management workflow to the metadata (or cataloging) staff. system based on service-oriented architecture next-generation ilss should be designed based on service-oriented architecture (soa). what is soa? a service-oriented architecture (soa) is an architecture for building business applications as a set of loosely coupled distributed components linked together to deliver a well-defined level of service. these services communicate with each other, and the communication involves data exchange or service coordination. soa is based on web services. broadly, soa can be classified into two aspects: services and connections, described below. services: a service is a function or some processing logic or business processing that is welldefined, self-contained, and does not depend on the context or state of other services. an example of a service is loan processing services, which can be a self-contained unit for processing loan applications. another example is weather services, used to get weather information. any application on the network can use the services of the weather service to get the weather information for a local area or region. in the library field, an example of a well-defined service is a check-in or check-out service. information technology and libraries | september 2012 80 connections: connections are the links connecting these self-contained distributed services with each other. they enable client-to-services communication. in case of web services, simple object access protocol (soap) is frequently used to communicate between services. there are many benefits of soa in the next-generation ils. these include the ability to be platform independent, therefore allowing libraries to use the software and hardware of their choice. there is no threat of being locked in to a single vendor, as many libraries are now with their current ilss. soa also enables incremental development, deployment, and maintenance. the vendors can use the existing software (investment) and use soa to build applications without replacing existing applications. as breeding described, the potential of web services (soa) for libraries includes • real-time interaction between library-automation systems and business systems of a library’s parent organization; • real-time interaction between library-automation systems and library suppliers or other business partners; • blending of library services into campus or municipal portal environments; • insertion of library services and content into courseware-management systems or other learning environments; • blending of content from external sources into library interfaces; and • delivery of library services and content to library users through nontraditional channels. 11 meet the challenge of the new library workflow the library systems in use today are, in general, aging—most were developed at least ten to fifteen years ago. they have been updated with software patches and new releases, but they still demand that staff work in the manner in which the systems were originally designed. although changes in our library operations have been realized in many organizations, these systems have not been able to adequately adapt to how library staff now want to—or need to—operate. the inability to keep pace with the move from largely print to increasingly electronic resources in our libraries is one of the reasons our existing systems fail. copeland et al. present a stunning visual of the typical workflow involved in acquiring and making available an electronic resource in the print-based library management system.12 their graphic depicts five possible starting points, nine decision points, and close to twenty steps involved in the process. this process may not be typical, but it is illustrative of the complex nature of our new workflows that simply cannot be accommodated by existing ilss. as early as 1997, the sirsi corporation recognized the need to modify systems; they introduced workflows, which is designed to streamline library operations.13 workflows, which introduced a graphical user interface to the sirsi unicorn system, was intended to allow staff a certain amount of flexibility and customization, depending on the tasks they typically perform. the new systems that are being developed and deployed today promise even more flexibility and propose to enable staff to work more efficiently irrespective of the format of the material being processed. but these systems will require staff to think about workflows in entirely different ways. not only will the method used to perform tasks be different (now web-based, hosted services as the next generation library system: a promise fulfilled? | wang and dawes 81 opposed to client-server-based tools) but the functionality has been enhanced to be more efficient. we cannot say how these new systems will be welcomed or resisted by staff. nor can we say how much staff savings will be realized because these systems are still too new and have not yet been implemented on a wide enough scale for a thorough assessment. but they are at least starting to address the issue. on the one hand, they will open a new window for further study and exploration of how to shape the next-generation ilss to suit the new library workflow. on the other hand, the library will benefit by changing some of their out-of-date practices and workflows around the new system. next-generation discovery layer current library opacs, like the ilss themselves, are more than ten years old and generally have shown no improvement in search capability, navigability, or discovery. meanwhile, search technology has radically improved in the past decade. frustrations with the opacs’ limitations on the part of both librarians and library users eventually motivated many libraries to seek alternatives. libraries want to take advantage of the advances in search and discovery technology by implementing “nextgen” opacs or library discovery services. given the vast range of resources available in libraries—local print holdings, specialized databases, and commercial databases to name only a few—libraries want a service that would make as many of them as discoverable as possible. the ideal system would have a unified search interface with a single search box, but with relevance ranking, faceted search, social tagging of records, persistent links to records, rss feeds for searches, and the ability to easily save searches or export selected records to standard bibliographic management software programs. the ideal system would also integrate with the library’s opac, overlaying its current interface with a more nimble and navigable interface that still allows real-time circulation status and provides as much support as possible for foreign language fonts. it would also be as customizable as possible. numerous options for discovery currently exist, and these include summon from serials solutions, primo from ex libris, worldcat local from oclc, ebsco discovery service, and encore from innovative interfaces. as these services are not the focus of this article, they will not be discussed in detail, but the next-generation ilss should have the ability to integrate seamlessly with these discovery services. analysis of two examples 1. alma development in early 2009, ex libris (owner of aleph and voyager) began discussions with several institutions (boston college, princeton university, and katholieke universiteit leuven; purdue university joined later) to develop what they then termed the unified resource management system (urm). the urm was to replace the existing ilss and the subsequent add-ons that provided functionality not inherently available, such as the electronic resources management (erm) tools. the “backend” operations would also be de-coupled from the user interface as described elsewhere in this paper. information technology and libraries | september 2012 82 through a series of in-person and online meetings with the development partners, ex libris staff developed the conceptual framework and functional requirements for the urm (later named alma) and began development of the product. alma was delivered to the partners in a series of releases, each with more functionality, and the feedback was used to enhance or further develop the product. alma uses the concept of a shared metadata repository (the metadata management system) to which libraries would contribute, through which records would be shared, and from which records would be downloaded and edited with local information. selection and acquisitions functions would be integrated not only within alma, but within the discovery layer to allow patrons, as well as staff, the ability to suggest items for addition to the library’s holdings. with “smart fulfillment,” the workflows for delivering materials to patrons will also be seamless.14 one of the major changes planned for alma is the ability to manage the types of resources that cannot be effectively managed in current ilss—specifically electronic and digital resources. these resources are currently managed with the use of add-on products that interact with varying degrees of success with the ilss. this lack of integration has been a source of frustration for library staff, particularly as library electronic and digital collections continue to steadily grow. the development partners have presented extensively at various conferences about the development process and have been mostly positive about the product. dawes and lute described princeton university’s participation in a presentation at the 2011 acrl conference in philadelphia.15 at princeton, an executive committee was created to oversee that partner’s process. other staff members were then involved in testing each of the partner releases as the functionality increased and was made available to them. the princeton university team then provided feedback to ex libris via regular telephone calls, after which they would see changes based on their feedback, or a status update from ex libris about the particular issue reported. the staff members at princeton believe that their participation in the development of alma has given them an opportunity to closely examine their workflows to see where efficiencies can be made. 2. kuali ole project in 2008 a group of nine libraries formed the open library environment (ole) project, later called kuali ole. kuali is a community of higher education institutions that came together to build enterprise-level and open-source applications for the higher education community. these systems include some core applications such as kuali financial system, kuali people management, and other campus-wide applications. the kuali ole is its most recent endeavor. the purpose of the kuali ole project is to build an enterprise-level, open-source, and next-generation ils. the goal of kuali ole, taken from its website (http://kuali.org/ole), is to “develop the first system designed by and for academic and research libraries for managing and delivering intellectual information.” there are six principal objectives of the project: • to be built, owned, governed by the academic and research library community • to supports the wide range of resources and formats of scholarly information • to interoperate and integrate with other enterprise and network-based systems the next generation library system: a promise fulfilled? | wang and dawes 83 • to support federation across projects, partners, consortia, and institutions • to provide workflow design and management capabilities • to provides information management capabilities to nonlibrary efforts the funding is provided by a contribution from the andrew w. mellon foundation and the nine partner institutions. kuali ole will be built based on the soa model, on top of the kuali middleware application, kuali rice, the core component of the kuali suite of applications. kuali rice “provides an enterprise class middleware suite of integrated products that allows for applications to be built in an agile fashion. this enables developers to react to end user business requirements in an efficient and productive manner, so that they can produce high quality business applications.”16 version 1.0 of kuali ole is scheduled to be released to the public in december 2012. a stepping and testing version (0.3) was released in november 2011, which covers some core acquisitions features such as “select” and “acquire” processes. we believe that the kuali ole software will not only provide an alternative solution of the ils for academic and research libraries, but will change the way the library conducts its business, and will also have implications for staffing. these changes will result from the comprehensive management of library materials and resources, and the system’s interoperability with other college-level enterprise applications. conclusion after about two decades of library automation system history, both libraries and vendors have begun to realize that a revolutionary change is needed in designing and developing the nextgeneration ils. the system, built on the model of soa, should enable the library to comprehensively and effectively manage all library resources and collections, should accommodate a more flexible library workflow, and should enable the library to provide better services to library users. it is encouraging to see that, in both the commercial and open-source arenas, concrete steps are being taken to develop these systems that will manage all library resources. alma and kuali ole are but two of the next-generation ilss in development. in 2011, serials solutions announced their intent to develop a system using the same principles as described. so have innovative interfaces and oclc, the latter of which has already released an early version of their product to some institutions. since these products are still in development and implementation is not yet widespread, their success in meeting the needs of the library community is still to be seen. references 1. marshall breeding, “next generation library automation: its impact on the serials community,” the serials librarian 56, no. 1–4 (2009): 55–64. 2. marshall breeding, “it’s time to break the mold of the original ils,” computers in libraries 27, no. 10 (2007): 39–41. 3. breeding, “next generation library automation information technology and libraries | september 2012 84 4. breeding, “it’s time to break the mold of the original ils.” 5. andrew pace, “21st century library systems,” journal of library administration 49, no. 6 (2009): 641–50. 6. ibid. 7. breeding, “it’s time to break the mold of the original ils.” 8. breeding, “next generation library automation.” 9. dave mitchell, “defining platform-as-a-service, or paas,” bungee connect developer network, 2008, http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-orpaas (accessed jan. 28, 2012). 10. ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (2007): 10–14. 11. marshall breeding, “web services and service oriented architecture,” library technology reports 42, no. 3 (2006): 3–42. 12. jessie l. copeland et al., “workflow challenges: does technology dictate workflow?” serials librarian 56, no. 1–4 (2009): 266–70. 13. “sirsi introduces workflows to streamline library operations,” information today 14, no. 7 (1997): 52. 14. ex libris, “ex libris alma: the next generation library services framework,” 2011, www.exlibrisgroup.com/category/almaoverview (accessed jan. 3, 2012). 15. acrl virtual conference, “princeton university discusses ex libris alma,” 2011, www.learningtimes.net/acrl/2011/906 (accessed jan. 3, 2012). 16. kuali rice website, http://www.kuali.org/rice (accessed sept. 10, 2012). http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://www.exlibrisgroup.com/category/almaoverview http://www.kuali.org/rice towards an open source-first praxis in libraries article towards an open source-first praxis in libraries j. robertson mcilwain information technology and libraries | december 2023 https://doi.org/10.5860/ital.v42i4.16025 about the author j. robertson mcilwain (corresponding author: mcilwain@berlin-international.de) is librarian and research associate, berlin international university of applied sciences © 2023. submitted: december 22, 2022. accepted for publication: september 12, 2023. published: 18 december 2023. abstract in terms of utility and technical quality, open-source software solutions have become a common option for many libraries. as barriers to adoption have been reduced and systems such as folio appear poised to change the landscape of lis technology, it is worth examining how the use of open source can support the normative core values of librarianship and to outline a strategy for critical engagement with the technology that is beneficial to patrons and libraries. such a strategy will require further codification, institutionalization, and investigation of open source at many levels. introduction open-source software has continued to gain popularity among libraries in the past decade. it has moved from the periphery to become a major competitor with some of the most established software in the library technology sector, but implementation has been uneven and is still represented in only a small percentage of libraries. among those that have adopted open-source systems, the language used to describe the switch is often related more to pragmatism than normative concerns.1 as acceptance of open source as a legitimate technical alternative to proprietary systems has gained traction, some may be interested in reevaluating the heretofore utilitarian drivers of open source adoption and ask how it can bolster the values and ideals of librarianship. the open-source movement, while sharing some of the same civic ideals as librarianship, is not as motivationally coherent. some corners of the movement are motivated by industrial or market concerns. therefore, as open source emerges as a common option for many libraries, it is in the interests of the profession to establish, early on, the terms on which it will critically engage with open source. as software has matured and third-party support has expanded, the technical barriers to adopting open source have greatly diminished and, especially when viewed through the lens of critical librarianship, the reasons to choose open source are more pertinent than ever. as noted, for many libraries, the conversation has up until now focused, and not entirely unjustly so, largely on utility and cost-effectiveness (an unfortunately myopic view of open-source software that stops at “potential utility” and highlights “ease of installation”) while ignoring how open source can support the values of librarianship and the library’s mission. while questions of support personnel and budget are still relevant, advances in the past decade mean that they no longer must represent the entirety of the discussion of open source in libraries. libraries now have the opportunity to look at what is arguably the more fundamental reason they should adopt an open-source-first praxis, an approach where closed-source proprietary systems should only be considered as a last resort. mailto:mcilwain@berlin-international.de information technology and libraries december 2023 towards an open source-first praxis in libraries 2 mcilwain libraries have a duty to their patrons. in order to serve them well, the profession has adopted a set of associated core values such as service, privacy, equity of access, stewardship, and intellectual freedom. the use of closed-source technology presents complicated ethical questions related to, among other things, information security, privacy, and transparency. fortunately, the lis and open-source communities share many of the same core values and can support each other in addressing the deficiencies and transgressions of proprietary software. because of the lowered barriers to entry and because the values of librarianship and the opensource community complement each other so well, open-source technologies present libraries with both a pragmatic solution to better serve patrons and a solution that aligns with the values of the profession. the justification arguments for libraries to use open source represent the intersection of pragmatic, utilitarian, and moral nonutilitarian stances. however, if open source is to reinforce the mission of libraries, it must be viewed through a critical lens. librarians must ask whether efforts to develop and introduce systems that are fundamental to their missions are best led by private enterprise or by libraries themselves. the motivation of this article is to review the current state of open-source technology in lis, address common concerns, especially regarding the principles of librarianship, and critically evaluate developments in the field. the use of open-source technology presents a pragmatic opportunity for libraries, but if not approached thoughtfully, it could potentially result in a compromise of professional ethics like what has already occurred more generally with the commodification of the information profession.2 theoretical note broadly speaking, this article is informed by a critical theoretical approach. “critical theories have been applied to lis under a general umbrella of ‘critical librarianship,’ which takes an explicitly political approach to information work, seeking to promote ethical practices which support the ethical creation and communication of scholarly knowledge with a focus on implications for social justice.”3 moreover, this article advocates a praxis in line with that defined by john budd: “action that carries social and ethical implications and is not reducible to technical performance of tasks.”4 more specifically, much is owed to bergquist et al.’s application of boltanski’s and thévenot’s justification framework to the development of the free and open-source software movement.5 it is further applied here to the use of open source in lis. put briefly, the framework presents a typology that describes how actors in various settings justify means and initiatives. 6 the typology is composed of six justification logics: inspirational, related to seeking an authenticity in life via artistry; domestic, related to maintenance of a traditional status quo; popular, in which personal aggrandizement is prioritized; civic, where the common good is paramount; market, where commerce is the focus; and industrial, where qualities such as efficiency, productivity, and functionality are used to justify actions (see table 1). this framework is particularly useful in a discussion of praxis since the nuances of motivation and justification can be more easily clarified. after briefly providing context for open source, its current use in libraries, and the core values associated with librarianship, i use this framework to inform my discussion of open source and librarianship. information technology and libraries december 2023 towards an open source-first praxis in libraries 3 mcilwain table 1. boltanski’s and thévenot’s justification typology justification logic defined by inspirational authenticity in life via artistry domestic maintenance of a traditional status quo popular personal aggrandizement civic prioritization of common good market prioritization of commerce industrial efficiency, productivity, and functionality open-source technology what is discussed here as open source is known as open-source software (oss), free and opensource software (foss) or free libre open-source software (floss or f/loss); each variation representing conflicting philosophies within the movement that range from communal development for the public good to profit-maximizing neoliberal business models. in the interest of simplicity and brevity, and since it is the most commonly used term within lis literature, the terms open source and open-source software are used throughout this discussion. the concepts underpinning open source were first introduced in the 1980s as private firms began restricting access to software (specifically to its source code) under the auspices of intellectual property rights. it was at this time that the gnu general public license (gpl) was written by richard stallman, the founder of the free software foundation. it stipulated that items licensed under the gpl were subject to the “four essential freedoms” to run, study, share, and modify the information therein, and that any derivative works should be subject to those freedoms as well. this latter concept, related to derivative works, is known as “copyleft.” according to ettlinger, “many open-source and free software developers have deliberately subverted the idea of intellectual-property rights and, in the process, created a rich common to which all could contribute, according to their abilities, and from which all could benefit, according to their needs; where innovations could be shared for free.”7 following this initial period of idealistically motivated development came another decisive moment for open source when linus torvalds, while working on linux in the early 1990s, discovered that by releasing the code as he went and making it easy for others to review and contribute to, the quality of the software was much higher than if one person or team were working on it in isolation. torvalds estimated that he only coded 2% of the project himself; the remainder came from contributors.8 soon industry found it difficult to ignore a development model that offered such a cost-effective approach to making high-quality software. later, other licenses, referred to as “permissive,” were introduced that did not require that the derivative works observe the same freedoms as the original. as a result, they were seen as less hostile to intellectual property and private enterprise. while a compromise of the original principles of the free software movement, this change was seen as a major turning point for open source, as it resulted in a significant growth in the amount of, and use of, open-source software. as the foundational freedoms were de-emphasized, we see the term “open source” instead of “free software” used more often from this point forward. information technology and libraries december 2023 towards an open source-first praxis in libraries 4 mcilwain today open source is a common foundation for, or component of, proprietary software, and firms like google and microsoft are major contributors to the development of open source. likewise, in the lis sector, it is not uncommon for open-source technologies to represent significant components of closed-source systems. within these developments of the open-source movement, there can be observed three major currents of importance for the present argument, or put another way, using the concepts of justification borrowed from luc boltanski and laurent thevénot, the three justification regimes employed for the use of open source could be described as civic, industrial, and market logic.9 during its early stages, use of open-source software was dominated by a civic logic based “on principles and rules defining free software as a common good” as codified in the “four essential freedoms” of the gnu gpl license, and later by an industrial logic that prioritized quality and efficiency10 as exemplified by torvalds’s work on the linux kernel. later still we see market justification employed with the introduction of permissive licenses. this will be addressed further below, but it is worth noting here that while there are additional logics at play when justifying the use of open source in general, it is the interaction of civic, industrial, and market logic that are especially relevant here, because they are mirrored in the justification for use of open source within librarianship. open-source trends in librarianship because we share so many of the values of the oss community, we should feel an obligation to promote open source in the library community.11 at this point it is worth briefly surveying the four major pieces of open-source software used in libraries (see table 2), all of which are library systems. the discussion of open source in libraries is often focused on integrated library systems (ilss), because they represent the single largest mission-critical system that most libraries work with on a daily basis and they affect almost every operation of the library. the discussion here tends to focus on ilss as well, but that should not suggest that there are not other powerful open-source technologies available to librarians. there are notable examples in discovery systems (aspen discovery, blacklight, vufind), institutional repositories (eprints, dspace, islandora, omeka, opus 4, samvera hyrax), content management systems (drupal, subjectsplus, wordpress, etc.), wikis (bookstack, mediawiki, etc.), and analytics (matomo, umami). there are even robust open-source platforms for networking and communication such as the ascendant mastodon microblogging platform. koha one of the first and, to date, most actively developed pieces of open-source lis software is the koha ils.12 it was launched in 2000 in new zealand for a group of three libraries, and it is licensed under the gnu gpl license. it has a very active global community and many private firms that offer support. traditionally popular with small to medium-sized libraries, koha has gained traction with larger academic and public libraries in recent years. opals in 2001, six new york state school library systems came together to create what would become opals (open-source automated library system). today opals is developed by a single company, media flex, and used primarily in school libraries. “opals support is provided through districts, other service centers, or directly through media flex. although an open-source software, development for opals is performed primarily by media flex.” 13 while open source and licensed information technology and libraries december 2023 towards an open source-first praxis in libraries 5 mcilwain with gnu gpl, opals does not appear to take advantage of a collaborative development model as its source code is only available by request from media flex. 14 evergreen in 2004, the georgia public library system began development of the evergreen ils for its large consortium of public libraries, and in 2006 evergreen was launched with a gnu gpl license. afterwards a nonprofit corporation, equinox, was formed to promote, develop, and support the system. because evergreen was originally developed with large consortia or library systems in mind, it offers possibilities of scale, but requires significant resources, which may have heretofore slowed it growth. folio folio was introduced in 2016 under the apache 2.0 license which, unlike the gnu gpl, does not require that derivative works carry similar licenses as the source. this means that in the future, proprietary software can be built with folio as a base, much like the web server software of the license’s namesake, apache, is used as the base for much of the internet today. despite relatively low levels of current adoption (see table 3), folio should not be underestimated. folio is being heavily promoted and has found several high-profile early adopters, especially from the now abandoned kuali ole project. notably, in mid-2022, the library of congress announced its intent to migrate to, and support, folio.15 the folio project is currently developed under the auspices of a single-member limited liability company by the same name, nested within the open library foundation (olf), and is supported by many large libraries and library consortia, but it was ebsco that, in 2015, began exploring the possibility of creating an open-source project and has since significantly funded, promoted, and steered the project.16 ebsco, as the only “enabling partner,” has stated that it “does not expect to exert direct control” beyond “its basic expectations of an open and modular system.”17 while ebsco’s outsized role in the conception, funding, and current presence in the project must not be overlooked, it is an open-source project and many (mostly academic) libraries have been present since early on. in addition, ebsco engaged index data, a well-respected lis software firm, to develop the initial technical platform.18 index data also provides services in support of folio for libraries. table 2. open-source ilss and license types open-source ils license type koha gpl – copyleft opals gpl – copyleft evergreen gpl – copyleft folio apache 2.0 – permissive awareness and use of open source in libraries while limited to reporting about integrated library systems and platforms, marshall breeding’s annual library automation perceptions reports show a significant growth in interest in open source in the past decade. the 2012 “survey reflected fairly low levels of interest in migrating to an open-source ils, even when the company rates their satisfaction with their current proprietary information technology and libraries december 2023 towards an open source-first praxis in libraries 6 mcilwain ils and its company as poor” compared to the 2022 report that noted “open source products are a routine option in all library sectors.”19 a closer look at specific sectors reveals a more complicated picture, however. in academic libraries in the us, we see in 2019 that use of open-source software is highest among those academic institutions that offer doctoral programs and lowest among those that offer associate degrees.20 awareness was not a barrier to adoption, but among current non-adopters there were surprisingly low levels of intent.21 in contrast, among public libraries, choi found in 2021 that awareness was still a barrier for adoption and, moreover, among current non-adopters there was very low intent to migrate to open source in the near future.22 breeding’s libraries.org features an extensive database of libraries worldwide and provides data based on library type with which we may draw some inferences. again, accounting only for ilss, open-source options currently account for just around 5% of the systems among academic, public, school, and special libraries (see table 3), but again here we see an uneven distribution. the popularity of the opals system among small school libraries (78%) may distort the overall picture (see table 4). folio, despite much discussion in field, still has a relatively small footprint, even among medium to large libraries (see table 5). in general, if we exclude opals from the calculation we see similar adoption rates of around 8–10% for all libraries. special libraries have higher rates of open-source adoption ranging from 26% to 30%, but the relatively low sample sizes must be taken into account (see tables 3–5). in the end, we still see modest adoption rates among libraries of all sizes, barring some outliers among small school and special libraries. despite anecdotal evidence that interest or discussion of open source in libraries is increasing relative to 10 years ago, that does not seem to have translated into significant adoption rates and, as choi and pruett have noted, interest among nonadopters is still low.23 an important question, then, is why have open-source solutions not been more widely adopted? while beyond the scope of the current paper, evidence suggests that lack of staffing for maintenance or customization is the biggest barrier blocking adoption, but as we will see later, the introduction of more and more third-party lis it support firms could lower that barrier.24 information technology and libraries december 2023 towards an open source-first praxis in libraries 7 mcilwain table 3. open-source ils/lsp adoption among libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 955 11.43% 3,227 8.98% 406 1.20% 294 25.59% 4,882 6.16% evergreen 37 0.44% 1,679 4.67% 49 0.14% 10 0.87% 1,775 2.24% opals 50 0.60% 15 0.04% 1,663 4.92% 39 3.39% 1767 2.23% folio 81 0.97% 6 0.02% 0 0.00% 3 0.26% 90 0.11% open source subtotal 1,123 13.44% 4,927 13.71% 2,118 6.26% 346 30.11% 8,514 10.74% grand total 8,358 35,943 33,812 1,149 79,262 note: grand total here equals all libraries identified by type, irrespective of collection size, but excludes those that did not indicate any ils. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl. table 4. open-source ils/lsp adoption among small libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 136 16.04% 375 9.55% 32 3.90% 57 23.75% 600 10.28% evergreen 5 0.59% 177 4.51% 2 0.24% 1 0.42% 185 3.17% opals 10 1.18% 4 0.10% 641 78.17% 14 5.83% 669 11.46% folio 2 0.24% 0 0.00% 0 0.00% 0 0.00% 2 0.03% open source subtotal 153 18.04% 556 14.15% 675 82.32% 72 30.00% 1,456 24.95% grand total 848 3,928 820 240 5,836 note: small libraries are defined as those with a collection size of less than 20,000 items. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl information technology and libraries december 2023 towards an open source-first praxis in libraries 8 mcilwain table 5. open-source ils/lsp adoption among medium/large libraries by type academic libraries public libraries school libraries special libraries academic, public, school, and special libraries n percent n percent n percent n percent n percent koha 368 9.39% 515 7.26% 35 14.52% 56 23.05% 974 8.47% evergreen 22 0.56% 656 9.24% 0 0.00% 2 0.82% 680 5.91% opals 17 0.43% 5 0.07% 60 24.90% 3 1.23% 85 0.74% folio 53 1.35% 1 0.01% 0 0.00% 2 0.82% 56 0.49% open source subtotal 460 11.73% 1,177 16.58% 95 39.42% 63 25.93% 1,795 15.61% grand total 3,920 7,097 241 243 11,501 note: medium and large libraries are defined as those with a collection size of greater than 19,999 items. source: marshall breeding, “libraries.org,” accessed december 21, 2022, https://librarytechnology.org/products/marketshare.pl. core values though not a monolithic profession, there are values associated with lis and many argue that they are quite robust and coherent, even internationally. it was, after all, in 1931 when ranganathan wrote the five laws of library science, asserting that: (1) books are for use, (2) every reader his/her book, (3) every book its reader, (4) save the time of the reader, and (5) the library is a growing organism.25 ranganathan’s five laws have been interpreted and reinterpreted many times over, but in them we may recognize the values still associated with librarianship. michael gorman, the notable library scholar and former president of the american library association, expounded on and made explicit the notion of core values during his career, identifying eight: stewardship, service, intellectual freedom, privacy, rationalism, commitment to literacy and learning, equity of access, and democracy.26 foster and mcmenemy went further and compared the codes of ethics of 36 national library associations and found that of gorman’s eight values, five appeared the most often: service, privacy, equity of access, stewardship, and intellectual freedom (see fig. 1).27 looking at the values identified here by ranganathan, gorman, and foster and mcmenemy, we start to see the intersection of the pragmatic, utilitarian, and moral nonutilitarian stances that define the profession. regarding open-source technology in libraries, utilitarian considerations have heretofore dominated the discussion, but thanks to the maturation of current technologies and dialog around critical librarianship, librarians may now want to evaluate open source in light of the ethics, ideals, and values associated with lis. though there are arguably valid and mutually reinforcing relationships between many of identified values and open source, this discussion will be confined to the five most cited values identified by foster and mcmenemy in the previous paragraph because, owing to their prevalence internationally, these may be considered the most universal. information technology and libraries december 2023 towards an open source-first praxis in libraries 9 mcilwain figure 1. percentage of 36 codes of ethics studied by foster and mcmenemy that adhere to gorman’s eight core values.28 open source and libraries many librarians have long identified the shared values between the profession and the opensource community,29 but perceived barriers (outlined below) have prevented widespread adoption of open-source technologies. this section addresses the use of open source in libraries considering the five core values identified above and argues that many of those perceived barriers are misguided, outdated, or otherwise not completely applicable. service librarianship is a profession defined by service. every aspect of librarianship, every action that we take as librarians can and should be measured in terms of service. 30 perhaps the most fundamental mission of the librarian is to assist patrons in locating the knowledge they seek. in its purest form one might imagine the reference interview, the one-onone interaction between patron and librarian in which the patron is guided through various resources until the answer is found. but the reference interview only represents one point of contact and its prominence in the popular image of the librarian overshadows the other complex labor that aims to connect the patron with information resources. technology plays an enormous role in the myriad complex tasks that are performed largely in the background. indeed, as noted by barron and preater, “contemporary librarianship, as practitioners have constructed it, could not exist without library systems.”31 it is, therefore, appropriate to begin a discussion of the use of open source in libraries with a discussion of how those technologies can allow librarians to better serve their communities, specifically how technology costs and functionality can affect s ervice. information technology and libraries december 2023 towards an open source-first praxis in libraries 10 mcilwain costs cost is often the first argument made for open source in libraries and given the perennial budget constraints of many libraries, it is easy to see why. the largest cost advantage of open source comes from the elimination of license fees and support flexibility. since the code is open and not owned by anyone, vendors cannot demand fees for the use of the software or per user/per installation fees. users are free to use the software as they wish, limited in most cases only by hardware availability and in some cases technical expertise. corrado goes further and notes “open-source software not only has a lower acquisition cost than proprietary software, it often has lower evaluation/implementation and support costs as well.”32 indeed, as noted by choi and pruett in their examination of open source adoption in academic libraries, the “ability to download and test the software in advance” was the fourth most cited driver for choosing open source.33 while there are often lower costs associated with open source, there are still costs, especially with support and infrastructure. some libraries will already have the technical expertise and physical hosting capacity to maintain and run open-source systems, and other “organizations will contract with specialized firms for the services needed to operate the software with the levels of reliability and performance expected for critical business functions.”34 the perceived lack of in-house technical expertise is a common barrier among libraries that are considering open-source solutions, but here again open source presents opportunities for libraries.35 instead of a single firm that produces the software and provides support, open source allows libraries to select options best suited for them based on the on-site expertise and physical capacity already available. flexibility and avoidance of vendor lock-in are closely tied to any discussion of cost and have been noted as significant drivers in choosing open source.36 the main distinction between support for proprietary and open-source systems is that with proprietary systems, support is generally limited to the firm that developed the software. if there is an issue that requires additional expertise, a library may be required to purchase an elevated support tier or may be otherwise waiting for a bug to be fixed or feature introduced at the discretion of the firm.37 in the opensource support world on the other hand, there are more options: first with regard to the companies providing support—if company a cannot or will not provide the desired level of support, company b may be a better option—and second, there are more options from the user community—if several users want a certain feature, they may work together to develop it and contribute it back to the project, making it available for everyone. or, as with projects that have formalized decision-making structures, they may decide to become active within the governance bodies to steer a project in a certain direction. moreover, support for an issue may already be openly available in the form of online documentation or user-driven support forums. so, while potentially spending less on support and infrastructure that is at the same time more bespoke, a library can support vendors and communities whose values more closely align with their own and can avoid being locked into lengthy service agreements (vendor lock-in) with the developers of the software.38 today there is a robust ecosystem around open source, providing support and hosting solutions. arguably one of the most prominent current examples in the open-source library community is bywater solutions. bywater solutions started in 2009 to provide support for the open-source koha ils, and while it was not the first firm set up to support open-source library systems, it differed notably from some predecessors such as ptfs (née liblime) because it strived to have a collaborative relationship with the global koha community. other prominent examples include catalyst, equinox, and ptfs europe (not related to ptfs cited above). information technology and libraries december 2023 towards an open source-first praxis in libraries 11 mcilwain while cost is an oft-cited reason for interest in or adoption of open source, in 2006 marshall breeding noted that “concerted interest in open source ilss began,” not primarily out of budgetary concerns but rather frustration with the functionality of proprietary ils options.39 quality/functionality/customization as the expectations of patrons change, the need for more and more sophisticated technology increases year on year, and as the needs of each institution are different, the desire to customize that technology to meet those needs increases in kind, creating a source of tension between libraries and library software vendors in the process. private firms, especially publicly traded ones, are under pressure to make the minimum viable product to maximize profits, hardly an offense for a for-profit company, but it does represent the divergent interests of firms and libraries.40 functionality and customization are at once barriers to and drivers for adopting open-source solutions, and this fact alone demonstrates the continued misconceptions around open source in libraries.41 still, for the present argument it is sufficient to say that, despite earlier doubts around the open-source development model and the quality of the software, the continuous growth in popularity of open source has proved it is a legitimate alternative to proprietary systems in terms of quality. indeed, perhaps the strongest argument for the quality of open-source technology can be made by the firms that produce proprietary software, including in the library sector, since many of them use open-source technology in their own software. for example, ex libris’s alma system, used by 36% of academic libraries in the us in mid-2022, relies on the open-source apache solr for its search index.42 another part of providing the best service to patrons is being able to evaluate how our systems function and how they serve results. the black-box nature of proprietary systems (i.e., we know what goes in and what comes out, but have little notion of what decisions are made within ) means that librarians’ ability to serve their patrons is at times significantly hindered. for example, as corrado has noted, the inclusion or exclusion of open-access journals in the indices of proprietary discovery systems such as ebsco discovery services (eds) and ex libris’s primo, while not as opaque as academic search engines such as google scholar, is not always transparent. 43 this could represent a specific problem for some libraries, but it also speaks to a more fundamental problem. because of the nature of software development and the business models of private firms, there is an associative amount of “protected” information that may be considered trade sensitive, and whenever it is not clear how a system arrived at, or delivered, a specific piece of information, that creates a power differential and disadvantages libraries and users. smith and hanson go further to note that the uneven power dynamics in library services limits patrons’ access to information and can limit librarians’ ability to work toward socially just outcomes.44 the increased transparency of open-source systems may provide librarians the means to better serve users by allowing them to better understand how library or discovery systems are serving results to users, ultimately helping them more easily find relevant information. the current dominant paradigm in lis is that libraries pay companies for access to mission-critical systems. all support and development are provided by one firm. if there are problems or bugs, librarians must dedicate resources to reporting those to the firm to be fixed (or not) at the discretion of the developer. barron and preater, referring to galvan’s “architecture of authority,” noted that “whereas community developers are actively contributing to open source projects, systems librarians contributing to supplier-hosted community areas are providing free labor to improve a system for which they have already paid: ‘we’re one of the only industries that pays for information technology and libraries december 2023 towards an open source-first praxis in libraries 12 mcilwain the privilege of improving products, just to get them to work the way we needed them to in the first place.’”45 librarians contributing to proprietary systems (that they have already paid for) provides a particularly stark illustration of an exploitative power dynamic. of course, private enterprise will continue to profit from the unpaid labor of open-source contributors as long as their systems are built on top of open-source packages (e.g., elasticsearch, apache server and solr, nginx, and mariadb, to name the most obvious), but at least libraries will not pay twice—once for the product, second for the labor to improve the product— as in the current model. in the end, proprietary firms and open source both have the capacity to produce modern, high quality systems, but all things being equal, open source has the added advantage of transparency and control, which reinforces rather than compromises the core values of the profession. privacy libraries have an obligation to ensure the privacy of those who use their services. the use of remotely hosted proprietary software suites can make that difficult, impossible, or at the very least difficult to appraise. the dominant model for ils hosting is now one in which the provider also hosts the software on their own servers, as opposed to locally installed instances. patron data—from name, birthdate, and home address to search queries and circulation records—are now often stored in remote databases that system administrators may not have complete access to. the terms of use of this data are detailed in the vendor’s privacy policy, which may change over time. due to limited capacity, libraries may not have the time or resources to review in detail each vendors’ privacy policy or each change to that policy. remotely hosting library systems provides advantages of scale for the ils providers and may reduce the it costs of the library, while also representing an outsourcing of library it labor, but it also represents another point where we see power dynamics shifting in favor of ils firms. with less control of and access to the systems used in the library, librarians are disadvantaged. moreover, warehousing the data of many libraries in one place may create a more attractive target for nefarious actors. for libraries without on-site it knowledge, having a system hosted remotely on servers maintained by dedicated professional staff offers clear advantages, and obviously using open-source software doesn’t immediately eliminate privacy concerns, but it does shift the power dynamic back to the librarian and enhances their agency in terms of proactively protecting users’ interests. as we will see below privacy also features in discussions of stewardship and intellectual freedom. equity of access the technologies used in lis are designed to either allow librarians to better serve their patrons or, in many cases, to allow patrons themselves to directly access knowledge. they are therefore, essential to any discussion of equity of access, a “basic premise that everyone has a right to have access to library resources and services, irrespective of who they are and where and under which conditions they live.”46 making high-quality, modern technology available with the lowest possible barrier is important to providing that access, and as noted previously, producing high -quality software is one area where open-source technology excels. it was also noted above, in the discussion of cost, that depending on the required third -party support and infrastructure, open source is often a less costly solution. the absence of annual licensing fees means that a larger portion of the money invested in systems will go towards development and maintenance, activities that directly serve the user. information technology and libraries december 2023 towards an open source-first praxis in libraries 13 mcilwain stewardship according to gorman, “stewardship in the library context has three components: the preservation of the human record to ensure that future generations know what we know, the care and nurture of education for librarianship so that we pass on our best professional values and practices, [and] the care and maintenance of our libraries so that we earn the respect of our communities.”47 referring to gorman’s first point, henderson notes that, “libraries play this archival role because history has shown that it is not economically viable for profit-based businesses to do so.”48 the most pressing threat posed by closed-source technology to this concept of stewardship is longterm access to the proprietary systems and formats that contain and transmit knowledge. paradoxically, this brings us to another one of the main reasons, as identified by wilson and mitchell, that libraries are reluctant to adopt open source: “the risks involved in using oss are too great.” namely, libraries are worried about investing in systems where no single company is responsible for their development.49 while true that generally no single entity is solely responsible for development, that can be an advantage. with the barriers to the transit of capital across national borders reduced or eliminated and the liberalization of financial markets in many parts of the world comes the consolidation of industries, including the publishing and library technology sectors, a topic familiar to most librarians. when one firm acquires another, priorities may change, and as trends, tastes, and the economic environments change, technologies may be rendered uneconomical, redundant, and ultimately useless. this can mean that a piece of software or file format that was in active development one day is shelved the next, its proprietary source code permanently frozen and support for it curtailed and eventually eliminated at the earliest possible moment that is contractually possible. users are left locked into an increasingly out-ofdate technology, exposed to data security vulnerabilities (creating potential privacy issues among other problems), or faced with the costly prospect of migrating to a new system. this scenario is taken for granted today, because operating at the whims of technology firms is a common occurrence, but the open-source model offers an alternative. there is nothing preventing interest in a particular piece of open-source software from waning for some of the same reasons as mentioned for closed-source software (changing trends, tastes, etc.), but what happens next is fundamentally different. instead of the source code being permanently frozen in a firm’s archives, anyone could take the open-source code and update it or adapt it for future use. if a group of libraries are all using a piece of open-source software that is no longer actively developed by the community, they could pool their resources to adapt or update the software to their needs and maintain functionality and address security issues. intellectual freedom intellectual freedom is perhaps the most obvious value shared by the open source and lis communities. if we return briefly to the formative ideas around the open-source movement, intellectual freedom is central, especially when viewed in light of the freedoms to run, study, share, and modify source code outlined in the initial gnu gpl license. applied to traditional libraries these freedoms might be reinterpreted as read, study, share, and modify, and often “intellectual freedom begins with opposition to censorship of books and other library materials.”50 but it should apply no less to computer code. supporting open source and a model of knowledge creation that eschews copyright maximalism and embraces the information commons reinforces librarianship’s own values around intellectual freedom. to return again briefly to privacy, it is also necessary to intellectual freedom, representing another, indirect, relationship between open source and libraries promotion of intellectual freedom.51 without privacy, patrons cannot fully utilize the information resources available to information technology and libraries december 2023 towards an open source-first praxis in libraries 14 mcilwain them. “protecting information privacy allows individuals to feel free to sample the marketplace of ideas without fear of interference or scrutiny, which could inhibit curiosity.”52 the prevalence of these and other core values within the lis community are a proclamation of what is important to the profession. they help guide practitioners and help us to keep our focus on the communities we serve. that doesn’t mean there isn’t any room for interpretation; indeed, as seen in figure 1, the core values we have identified here are interpreted differently and are adhered to, to varying degrees in different places. it is the responsibility of each of us to apply these values to the work we do each day. a critical appraisal of current trends it’s hard to discuss the current state of open source in libraries without talking about folio, or the future of libraries is open. the enthusiasm behind folio is notable and its early adoption among large established academic libraries is impressive, especially for an open-source project, but the prominent role that the private sector plays in its development deserves critical examination. indeed, with the introduction of the folio library services platform (lsp), it is worth looking more closely at a strategy among private companies to leverage open-source technology (and the labor behind it) to bolster profits and reputational capital. already in 1999, eric raymond identified “open development,” a term used by linus torvalds to describe what would become known as open source, and “decentralized peer review” to “lower costs and improve software quality.”53 “open innovation,” as it became known, is a business model designed to profit from open-source technology.54 with the ascension of open innovation, the dominant justification was no longer civic but rather industrial (efficiency, quality, scale) and market (competition, profit), and there are many examples. in recent years, there has been much discussion of microsoft shipping a linux kernel inside of windows because this would have been unimaginable twenty years ago when steve balmer declared that “linux is a cancer that attaches itself in an intellectual pro perty sense to everything it touches”—presumably a reference to the gnu gpl’s requirement that derivative works carry the same open license.55 as more permissive licenses were introduced, microsoft has been making more and more moves towards interoperability between its own systems and open source. setting aside the 2014 statement from its then ceo that “microsoft ♥ linux,” microsoft’s approach to open source has been largely calculated and pragmatic, a strategy to ensure that its azure cloud computing service can host systems that the vast majority of the web runs on.56 still, its 2019 purchase of code-sharing platform github for $7.5 billion was a testament to the fact that microsoft saw value in open innovation and open source.57 the same could be said of google. when suddenly confronted with a major competitor potentially cornering the market for mobile operating systems (the 2007 release of apple’s ios), google decided to put its energies into supporting the development of the android open source project (aosp) and building proprietary components on top of it. aosp is the open-source base underpinning android. aosp is, as the name suggests, open source, whereas android includes many proprietary critical components. this is made possible because aosp is licensed with a permissive open-source license (apache 2) that does not require derivative works to have a similarly open license. as time passed, google introduced more and more closed-source components that mirrored essential aosp functionality, at which point in many cases development on the aosp counterpart ceased, at least from google’s perspective. this has left the original aosp project largely unusable without additional (now) proprietary components. information technology and libraries december 2023 towards an open source-first praxis in libraries 15 mcilwain the most explicative for our discussion however is ibm. ibm became the first major firm to pivot in supporting open source when, in 2001, it announced that it would invest $1 billion in opensource development. ibm’s then ceo lou gerstner explained the company’s shift to investing in open source and the proprietary software that it planned to develop on top of it when he earnestly commented “giving one away helps increase sales of the other.”58 pamela samuelson went further: “there are at least three stories one can tell about this shift. ibm’s adoption of open source can be viewed: as an anti-microsoft strategy; as a consequence of changed business models in the software industry; and as a manifestation of an open innovation strategy for promoting faster and more robust technical advances.”59 if we take this quote and replace ibm with ebsco and microsoft with proquest, we may have a ready-made explanation of folio as well. around the same time that its competitor proquest purchased library system developer ex libris in 2015, ebsco announced the launch of a competing open-source platform, folio. after initial discussions were carried out in the first half of 2015, formal approval arrived in the autumn of the same year, and development began in earnest soon after.60 irrespective of motivations, the decision leveraged the predictable community enthusiasm for open source, while reaping the benefits of that community’s efforts to develop the platform. according to ebsco executive vice president sam brooks, “ebsco will contribute more than any previous library vendor has to an open source project, comparable or greater than what other organizations have invested in creating proprietary lsps.”61 ettlinger notes that “through a series of calculated tactics, firms can appear to be altruistically contributing technologies to the public domain, while indirectly promoting demand for their products.”62 beyond the direct profits earned as a folio service and hosting provider, the benefits for ebsco—from gaining foundational access to a library system platform that has been built to its own specifications to acquiring reputational capital, capital that, among some in the lis community, frames the firm as a benevolent and selfless patron of libraries—are clear. librarians must evaluate whether this is the best model for libraries and their patrons. the potential benefits of a robust, versatile, and scalable open-source library system for the lis community are great, but librarians must ensure that the core values that shape the profession are not compromised during its development. alternative models the communities that have emerged around projects such as koha and evergreen are sizable and have resulted in robust systems. other examples, such as the kuali ole, were less successful. it is beyond the scope of this paper to examine the specific reasons for the relative successes of some projects compared to others, but it may be valuable to briefly explore some alternative models to private enterprise-led open-source development, since as seen above, those models may not represent the best interest of libraries or the public in terms of core values. with open source, the community around a project is key to its success, but funding and leadership are also essential. first, funding to develop open-source library systems can come from anyone who is interested in the project, but with funding comes the ability to directly or indirectly steer the project. therefore, there is a strong argument for such projects to be largely publicly funded. making libraries better and more accessible is in the public interest. libraries are a legitimate recipient of public funding, and that extends to the software that makes possible many of the services that users have come to expect. to look briefly at europe and the united states, there are several potential partners. in europe, the european union and its member states have, in recent years, committed in various ways to information technology and libraries december 2023 towards an open source-first praxis in libraries 16 mcilwain promoting and using open source.63 the eu’s stated motivations, or operational principles as first laid out in the 2018 european commission digital strategy, are digital by default, security and privacy, openness and transparency, interoperability and cross-border, and user-centric/datadriven/agile.64 there is obvious overlap here with the identified values of librarianship and the eu has already shown itself to be a valuable partner to libraries through such efforts as the europeana project.65 at the national level there are many prospective supporters present including the german research foundation (deutsche forschungsgemeinschaft) with an annual budget of €3.6 billion in 2021,66 the belgian science policy office (belspo), the dutch research council (nederlandse organisatie voor wetenschappelijk onderzoek), the french national research agency (agence nationale de la recherche), and the italian national research agency (agenzia nazionale per la ricerca) among others. in the us, the institute of museum and library services, established in 1996, is a logical source of funding as its mission is “to advance, support, and empower america’s museums, libraries, and related organizations through grantmaking, research, and policy development.”67 as for leadership, again there is a strong argument to be made for stakeholders, in this case libraries themselves, to govern and steer open-source lis projects. this requires open and transparent governance that again reflects the values of the profession, e.g., equity of access. there is a long history of national libraries leading publicly funded projects, from the library of congress developing any number of technologies, including marc records, to the koninklijke bibliotheek providing administrative support to europeana. there is also room for library consortia or associations to lead these efforts. in germany, for example, regional library consortia have been developing and sharing library-related technology for years, including widely used solutions such as dbis (datenbank-infosystem), the ezb (elektronische zeitschriftenbibliothek), and opus 4. indeed, the participation of several german library consortia (among many other international library partners) in the folio project suggests that it will not likely become locked to any one private-sector actor. though, given the foundational support provided by some, ebsco and index data in particular, it may be difficult to imagine the project continuing if that support was to suddenly vanish. as profits dictate corporate acquisitions and acquisitions dictate priorities, librarianship is often placed at a disadvantage. librarians and libraries must evaluate whether a more sustainable solution may be found in a model that is publicly funded and led by libraries. conclusion open-source technology presents a valuable opportunity to libraries and librarians to better serve their users by supporting the core values of the profession. supporting these core values is both pragmatic (aligned with the core value service) and moral-idealistic (aligned with the core values privacy, equity of access, stewardship, and intellectual freedom). at the same time, it is important for librarians to critically evaluate and challenge cultural assumptions around the current state of open source and the inherent power dynamics, and information as a commodity. awareness and use of open source continue to increase among libraries of all sizes, but research suggests disparities between different types and sizes of libraries. moreover, the nuances regarding open-source technology are rarely addressed in the literature. in order to further promote its shared values and enrich the profession, librarianship as a whole should formally address and support open source through further codification, institutionalization, and investigation. this could be done by including open source in the accreditation requirements for information technology and libraries december 2023 towards an open source-first praxis in libraries 17 mcilwain lis degree programs, for instance, inclusion in the technology section of the ala’s core competences of librarianship.68 individual librarians are encouraged to explore toolkits like awesome self-hosted (https://selfhosted.libhunt.com/) and to continue to develop and promote open source in their libraries. turning to communities such as code{4}lib (https://code4lib.org/) and the eu’s open source observatory (https://joinup.ec.europa.eu/collection/open-sourceobservatory-osor/) for questions or to share experiences is also valuable. once awareness of open source and its nuances are more widespread within the profession, we may start to have more critical conversations about the most beneficial ways of using the technology to better serve our users. endnotes 1 namjoo choi and joseph a. pruett, “the context and state of open source software adoption in us academic libraries,” library hi tech 37, no. 4 (november 18, 2019): 648, https://doi.org/10.1108/lht-02-2019-0042. 2 stuart lawson, kevin sanders, and lauren smith, “commodification of the information profession: a critique of higher education under neoliberalism,” journal of librarianship and scholarly communication 3, no. 1 (march 10, 2015), https://doi.org/10.7710/2162-3309.1182. 3 lawson, sanders, and smith, “commodification,” 17. 4 john m. budd, “the library, praxis, and symbolic power,” the library quarterly 73, no. 1 (january 2003): 20, https://doi.org/10.1086/603373. 5 magnus bergquist, jan ljungberg, and bertil rolandsson, “a historical account of the value of free and open source software: from software commune to commercial commons,” in ifip international conference on open source systems (springer, 2011), 196–207. 6 bergquist, ljungberg, and rolandsson, “a historical account,” 197. 7 nancy ettlinger, “the openness paradigm,” new left review, no. 89 (october 2014): 97. 8 ettlinger, “the openness paradigm,” 94. 9 luc boltanski and laurent thévenot, on justification: economies of worth (princeton university press, 2006). see also the discussion of boltanski and thévenot’s justification theory applied to the development of the open-source software movement in bergquist, ljungberg, and rolandsson, “a historical account.” 10 bergquist, ljungberg, and rolandsson, “a historical account,” 199; ibid. 201. 11 jason puckett, “open source software and librarian values,” georgia library quarterly 49 (2012): 4. 12 “about – official website of koha library software”. accessed 7 december 2023. https://kohacommunity.org/about/. 13 marshall breeding, “library systems report 2016,” american libraries magazine, may 2, 2016, https://americanlibrariesmagazine.org/2016/05/02/library-systems-report-2016/. https://selfhosted.libhunt.com/ https://code4lib.org/ https://joinup.ec.europa.eu/collection/open-source-observatory-osor https://joinup.ec.europa.eu/collection/open-source-observatory-osor https://doi.org/10.1108/lht-02-2019-0042 https://doi.org/10.7710/2162-3309.1182 https://doi.org/10.1086/603373 https://americanlibrariesmagazine.org/2016/05/02/library-systems-report-2016/ information technology and libraries december 2023 towards an open source-first praxis in libraries 18 mcilwain 14 marshall breeding, “major open source ils products,” library technology reports 44, no. 8 (february 26, 2009): 16–31. 15 leah knobel, “library of congress launches effort to transform collections management and access,” library of congress newsroom, september 21, 2022, https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transformcollections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc. 16 liu tiewei, "how is folio different from its predecessors?,” international journal of librarianship 6, no. 2 (december 22, 2021): 41; marshall breeding, “ebsco supports new open source project,” american libraries magazine (april 22, 2016), https://americanlibrariesmagazine.org/2016/04/22/ebsco-kuali-open-source-project/. 17 “members,” folio, accessed october 24, 2023, https://folio.org/community/members/; breeding, “ebsco supports new open source project.” 18 marshall breeding, “folio: a new open source initiative,” in “open source library systems: the current state of the art,” library technology reports 53, no. 6 (august/september 2017): 27. 19 marshall breeding, “perceptions 2012: an international survey of library automation,” january 21, 2013, https://librarytechnology.org/perceptions/2012/; marshall breeding, “library perceptions 2022: results of the 15th international survey of library automation,” april 17, 2022, https://librarytechnology.org/perceptions/2021/. 20 choi and pruett, “the context and state of open source software adoption,” 653. 21 choi and pruett, “the context and state of open source software adoption,” 641. 22 namjoo choi, “an empirical examination of open source software adoption in us public libraries,” the electronic library 35, no. 5 (2021): 695. 23 choi and pruett, “the context and state of open source software adoption”; choi, “an empirical examination.” 24 choi and pruett, “the context and state of open source software adoption,” 646. 25 s. r. ranganathan, the five laws of library science (london: edward goldston, ltd, 1931), https://hdl.handle.net/2027/uc1.$b99721. 26 michael gorman, our enduring values revisited: librarianship in an ever-changing world (chicago: ala editions, 2015). 27 catherine foster and david mcmenemy, “do librarians have a shared set of values? a comparative study of 36 codes of ethics based on gorman’s enduring values,” journal of librarianship and information science 44, no. 4 (december 2012): 253, https://doi.org/10.1177/0961000612448592. 28 foster and mcmenemy, “do librarians have a shared set of values?,” 253. https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transform-collections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc https://newsroom.loc.gov/news/library-of-congress-launches-effort-to-transform-collections-management-and-access/s/c432d3c2-780b-4bfe-9123-bbb6c25631bc https://librarytechnology.org/perceptions/2012/ https://librarytechnology.org/perceptions/2021/ https://hdl.handle.net/2027/uc1.$b99721 https://doi.org/10.1177/0961000612448592 information technology and libraries december 2023 towards an open source-first praxis in libraries 19 mcilwain 29 daniel chudnov, “open source software: the future of library systems?,” library journal 124, no. 13 (august 1, 1999): 40; micah altman, “open source software for libraries: from greenstone to the virtual data center and beyond,” iassist quarterly 25, no. 4 (december 4, 2002): 5, https://doi.org/10.29173/iq856; puckett, “open source software and librarian values”; choi and pruett, “the context and state of open source software adoption.” 30 gorman, our enduring values revisited, 92. 31 simon barron and andrew preater, “critical systems librarianship,” in the politics of theory and the practice of critical librarianship, ed. karen p. nicholson and maura seale (sacramento, california: library juice press, 2017). 32 edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science and technology librarianship, no. 42 (spring 2005): 3, https://doi.org/10.5062/f42f7kd8. 33 choi and pruett, “the context and state of open source software adoption,” 648 . 34 breeding, “folio: a new open source initiative,” 7. 35 choi and pruett, “the context and state of open source software adoption,” 646. 36 choi and pruett, “the context and state of open source software adoption,” 648. 37 corrado, “the importance of open access,” 3. 38 robert wilson and james mitchell, open source library systems: a guide, lita guides (lanham: rowman & littlefield, 2021), 23. 39 marshall breeding, “adoption patterns of proprietary and open source ils in u.s. libraries,” computers in libraries 35, no. 8 (january 2015): 18. 40 barron and preater, “critical systems librarianship,” 95. 41 wilson and mitchell, open source library systems, 21; choi and pruett, “the context and state of open source software adoption,” 248. 42 marshall breeding, “libraries.org,” accessed august 9, 2022, https://librarytechnology.org/libraries/. 43 edward m. corrado, “revisiting the importance of open access, open source, and open standards for libraries,” technical services quarterly 38, no. 3 (july 3, 2021): 287, https://doi.org/10.1080/07317131.2021.1934312. 44 lauren smith and michael hanson, “communities of praxis: transforming access to information for equity,” the serials librarian 76, no. 1–4 (june 14, 2019): 43, https://doi.org/10.1080/0361526x.2019.1593015. 45 barron and preater, “critical systems librarianship,” 95. 46 gorman, our enduring values revisited, 159. https://doi.org/10.29173/iq856 https://doi.org/10.5062/f42f7kd8 https://librarytechnology.org/libraries/ https://doi.org/10.1080/07317131.2021.1934312 https://doi.org/10.1080/0361526x.2019.1593015 information technology and libraries december 2023 towards an open source-first praxis in libraries 20 mcilwain 47 gorman, our enduring values revisited, 76. 48 carol c. henderson, “why librarians care about intellectual property law and policy,” american library association, march 10, 2019, https://www.ala.org/advocacy/copyright/copyrightarticle/librariescreatures. 49 wilson and mitchell, open source library systems, 20. 50 gorman, our enduring values revisited, 110. 51 barron and preater, “critical systems librarianship,” 98. 52 cherie l. givens, information privacy fundamentals for librarians and information professionals (blue ridge summit: rowman & littlefield publishers, 2014), 27. 53 e. raymond and b. young, the cathedral & the bazaar: musings on linux and open source by an accidental revolutionary, rev. ed. (o’reilly, 2001), xi. 54 henry chesbrough, open business models: how to thrive in the new innovation landscape (harvard business press, 2006), 43. 55 dave newbart, “microsoft ceo takes launch break with the sun-times,” chicago sun-times, 2001, https://web.archive.org/web/20011211130654/http://www.suntimes.com/output/tech/cstfin-micro01.html. 56 steven vaughan-nichols, “why microsoft loves linux,” zdnet, october 29, 2014, https://www.zdnet.com/article/why-microsoft-loves-linux/; cade metz, “why microsoft ceo satya nadella loves what steve ballmer once despised,” wired, october 21, 2014, https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmerdespised/. 57 alex hern, “microsoft is buying code-sharing site github for $7.5bn,” technology, the guardian, june 4, 2018, https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buyingcode-sharing-site-github-say-reports. 58 chesbrough, open business models, 240. 59 pamela samuelson, “ibm’s pragmatic embrace of open source,” communications of the acm 49, no. 10 (october 2006): 22, https://doi.org/10.1145/1164394.1164412. 60 breeding, “ebsco supports new open source project.” 61 breeding, “ebsco supports new open source project.” 62 ettlinger, “the openness paradigm,” 98. 63 see for example “open source software strategy 2020 – 2023 think open,” communication to the commission (brussels: european commission, october 21, 2020), https://ec.europa.eu/info/sites/default/files/en_ec_open_source_strategy_2020 -2023.pdf; “berlin declaration on digital society and value-based digital government” (ministerial https://www.ala.org/advocacy/copyright/copyrightarticle/librariescreatures https://web.archive.org/web/20011211130654/http:/www.suntimes.com/output/tech/cst-fin-micro01.html https://web.archive.org/web/20011211130654/http:/www.suntimes.com/output/tech/cst-fin-micro01.html https://www.zdnet.com/article/why-microsoft-loves-linux/ https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmer-despised/ https://www.wired.com/2014/10/microsoft-ceo-satya-nadella-loves-steve-ballmer-despised/ https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buying-code-sharing-site-github-say-reports https://www.theguardian.com/technology/2018/jun/04/microsoft-is-buying-code-sharing-site-github-say-reports https://doi.org/10.1145/1164394.1164412 https://ec.europa.eu/info/sites/default/files/en_ec_open_source_strategy_2020-2023.pdf information technology and libraries december 2023 towards an open source-first praxis in libraries 21 mcilwain meeting during the german presidency of the council of the european union, december 8, 2020), https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digita l_society_and_val ue-based_digital_government_.pdf; “european commission digital strategy next generation digital commission” (european commission, june 30, 2022), https://ec.europa.eu/info/sites/default/files/strategy/decisionmaking_process/documents/c_2022_4388_1_en_act.pdf. 64 “european commission digital strategy: a digitally transformed, user-focused and datadriven commission” (european commission, november 21, 2018), 4–6. 65 j. robertson mcilwain, “the eu and library science fostering legitimacy through partnership” (presentation, 27th international conference of europeanists, june 2021), https://doi.org/10.6084/m9.figshare.20347917.v2. 66“jahresbericht 2021: aufgaben und ergebnisse” (bonn: deutsche forschungsgemeinschaft), accessed august 24, 2022, https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021. pdf. 67 institute of museum and library services, fy 2022–2026 strategic plan, march 2022, 3, https://www.imls.gov/sites/default/files/2022-02/imls-strategic-plan-2022-2026.pdf. 68 american library association, ala’s core competences of librarianship, january 27, 2009, https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorec ompstat09.pdf. https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digital_society_and_val%20ue-based_digital_government_.pdf https://ec.europa.eu/isa2/sites/isa/files/cdr_20201207_eu2020_berlin_declaration_on_digital_society_and_val%20ue-based_digital_government_.pdf https://ec.europa.eu/info/sites/default/files/strategy/decision-making_process/documents/c_2022_4388_1_en_act.pdf https://ec.europa.eu/info/sites/default/files/strategy/decision-making_process/documents/c_2022_4388_1_en_act.pdf https://doi.org/10.6084/m9.figshare.20347917.v2 https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021.pdf https://www.dfg.de/download/pdf/dfg_im_profil/geschaeftsstelle/publikationen/dfg_jb2021.pdf https://www.imls.gov/sites/default/files/2022-02/imls-strategic-plan-2022-2026.pdf https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorecompstat09.pdf https://www.ala.org/educationcareers/files/careers/corecomp/corecompetences/finalcorecompstat09.pdf abstract introduction theoretical note open-source technology open-source trends in librarianship koha opals evergreen folio awareness and use of open source in libraries core values open source and libraries service costs quality/functionality/customization privacy equity of access stewardship intellectual freedom a critical appraisal of current trends alternative models conclusion endnotes modeling a library website redesign process: developing a user-centered website through usability testing danielle a. becker and lauren yannotta information technology and libraries | march 2013 6 abstract this article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. four rounds of usability testing were conducted throughout the process of building a new academic library web site. participants were asked to perform tasks using a talk-aloud protocol. tasks were based on guiding principles of web usability that served as a framework for the new site. results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. introduction in 2008 the hunter college libraries launched a two-year website redesign process driven by iterative usability testing. the goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. the guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. this paper will review the literature on iterative usability testing, user-centered design, and thinkaloud protocol and the implications moving forward. it will also outline the methods used for this study and discuss the results. the model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. we believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. danielle a. becker (dbe0003@hunter.cuny.edu) is assistant professor/web librarian, lauren yannotta (lyannotta@hotmail.com) was assistant professor/instructional design librarian, hunter college libraries, new york, new york. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com modeling a library website redesign process | becker 7 background the goals of the research were to (1) determine the effectiveness of the hunter college libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student informationseeking habits. a formal usability test was conducted both on the existing hunter college libraries website (appendix a) and the following drafts of the redesign (appendix b) with twenty users over an eighteen-month period. the testing occurred before the website redesign began, while the website was under construction, and after the site was launched. the participants were selected through convenience sampling and informed that participation was confidential. the intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. the redesign of the website began with a complete inventory of the existing webpages. an analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. we identified six core goals that we felt were the most important for all users of the library’s website: 1. user should be able to locate high-level information within three clicks. 2. eliminate library jargon from navigational system using concise language. 3. improve readability of site. 4. design a visually appealing site. 5. create a site that was easily changeable and expandable. 6. market the libraries’ services and resources through the site. literature review in 2010, oclc compiled a report, “the digital information seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. search engines are preferred because of speed, ease of use, convenience, and availability.1 similar studies such as emde et al., and gross and sheridan, have shown that students are not using library websites to do their research.2 gross and sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 this research shows the importance of creating streamlined websites that will information technology and libraries | march 2013 8 compete for our students’ attention. in building a new website at the hunter college libraries, we thought the best way to do this was through user-centered design. web designers both inside and outside the library have recognized the importance of usercentered design. nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 he asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 he also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 in their article, “how do i find an article? insights from a web usability study,” cockrell and jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. hulseberg and monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 performing usability testing is one way to confirm user-centered design. in his book don’t make me think!, krug insists that usability testing can provide designers with invaluable input. that, taken together with experience, professional judgment, and common sense, makes design choices easier.8 ipri, yunkin, and brown, in their article “usability as a method for assessing discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 in battleson, booth, and weatherford’s literature review for their usability testing of an academic library website case study, they summarize dumas and redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. they conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 this allows the designers to evaluate the results and identify problems with the design being tested. 11 usability experts nielsen and tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. they conclude it is better to conduct frequent, smaller studies with a maximum of five users. they assert, “you will always have discovered so many blunders in the design that it will be better to go back to the drawing board modeling a library website redesign process | becker 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 based on the strength of the literature, we decided to use iterative testing for our usability study. krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 according to the united states department of health and human services report “research-based web design and usability guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 manzari and trinidad-christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. in their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 battleson, booth, and weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a web site’s initial design.” 16 they explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 george used iterative testing in her redesign of the carnegie mellon university libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 cobus, dent, and ondrusek used six students to usability test the “pilot study.” then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. after the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 mcmullen’s redesign of the roger williams university library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 but continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 bauer-graham, poe, and weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. they waited a semester, distributed another survey to determine the functionality of the current site. the survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 when testing participants, in the article “how do i find an article? insights from a web usability study,” cockrell and jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. they found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 in conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. van den haak, de jong, and schellens define think-aloud protocol as relying on a method information technology and libraries | march 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. the usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. instead, the test follows the individual’s thoughts during the execution of the tasks. 24 nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . one gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 turnbow ‘s article “usability testing for web redesign: a ucla case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 cobus, dent, and ondrusek used the think-aloud protocol in their usability study. they encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using camtasia.27 this information was used to successfully reorganize hunter college library’s website. method an interactive draft of hunter college libraries redesigned website was created before the usability study was conducted. in spring 2009, the authors created the protocol for the usability testing. a think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. draft questions were written, and we conducted mock usability tests on each other. after several drafts we revised our questions and performed pilot tests on an mlis graduate student and two undergraduate student library assistants with little experience with the current website. we ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. we made the revisions and eliminated a question that was redundant. all recruitment materials and finalized questions were submitted to the institutional review board (irb) for review and went through the certification process. after receiving approval we secured a private room to conduct the study. participants were recruited using a variety of methods. signs were posted throughout the library, an e-mail was sent out to several hunter college distribution lists, and a tent sign was erected in the lobby of the library. participants were required to be students or faculty. participants were offered a $10.00 barnes & noble gift card as incentive. applicants were accepted on a rolling basis. twenty students participated in the web usability study (appendix c). no faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library modeling a library website redesign process | becker 11 website. the redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. tests were scheduled for thirty-minute intervals. we conducted four rounds of testing using five participants per round. the two researchers switched questioner and observer roles after each round of testing. each participant was asked to think aloud while they completed the tasks and navigated the website. both researchers took notes during the tests to ensure detailed and accurate data was collected. each participant was asked to review the irb forms detailing their involvement in the study, and they were asked to consent at that time. their consent was implied if they participated in the study after reading the form. the usability test consisted of fifteen task-oriented questions. the questions were identical when testing the old and new draft site. the first round tested only the old site, while the following three rounds tested only the new draft site. we tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. the questions (appendix d) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. the tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. as a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. initially the tests were recorded using camtasia software. this allowed us to record participants’ navigation trails through their mouse movements and clicks. but, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. after the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. these questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. information technology and libraries | march 2013 12 results table 1. percent of tasks answered correctly discussion hunter college libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. as a result, a decision was made to build a new site using a content management system (cms) to make the site easily expandable and simple to update. this study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. task successes and failures the high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. krug contends that navigation educates the user on the site’s contents through its visible hierarchy. the result is a site that guides the user through their options and instills confidence in task old site new site find a book using online library catalog 80% 86% find library hours 100% 100% get help from a librarian using questionpoint 40% 93% find a journal article 20% 66% find reference materials 0% 7% find journals by title 40% 66% find circulation policies 60% 53% find books on reserve 80% 73% find magazines by title 0% 73% find the library staff contact information 60% 100% find contact information for the branch libraries 40% 100% modeling a library website redesign process | becker 13 the website and its designers.28 we found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. the users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. we made “hours” the first link under the “about” heading and “cuny+/books” the first link under the “find” heading and as a result both our terminology and our structure was a success with participants. on the old website, users rarely used the libraries’ online chat client. despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. in the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. instead, on the new site, the “ask a librarian” link was prominently featured on the top of the screen. these results upheld the guiding principles of solid information architecture and understandable terminology. it also supported nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 as a result the launch of the redesigned site, the use of the questionpoint chat client has more than doubled. finding a journal article on a topic was always problematic for users of the old library website. the participants we tested were familiar with the site, and 80 percent erroneously clicked on “journal title list” when the more appropriate link would have been “databases” if they didn’t have an exact journal title in mind. although we taught this in our information literacy courses, it was challenging getting the information across. in order to address this on the new site, “databases” was changed to “databases/articles” and categorized under the heading “find.” the participants using the new site had greater success with the new terminology; 66 percent correctly chose “databases/articles.” this question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. these issues were addressed by adding the word “articles” after “databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “find” to further explain the action a student would be taking by clicking on the “databases/articles” link. finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “subject guides.” in an effort to increase usage of the research guides, the library not only purchased the libguides tool, but also changed the wording of the link to “topic guides.” as we neared the end of our study we observed that only one participant knew to click on the “topic guides” link for research assistance. the participants suggested calling it “research guides” instead of “topic guides” and we changed it. unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. anecdotally, the rewording of this link appears to be more understandable to users as the information technology and libraries | march 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. the rewording of these guides adhered to both principles of understandable terminology and usercentered design. these results supported nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “users should be able to tell in a glance what the page is about and what it can do for them.” 30 our results also supported the hhs report, which states that terminology “plays a large role in the user’s ability to find and understand information. many terms are familiar to designers and content writers, but not to users.” 31 we concluded that rewriting the link based on student feedback reduces the use of terminology. although librarians are “subject specialists” and “subject liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s libguides. as previously discussed, students of the old site thought the link “journal title list” would give them access to the library’s database holdings. when asked to find a specific journal title the correct answer to this question on the old site was “journal title list,” with only 40 percent of the participants answering correctly. another change to terminology in the new site, both were placed under the heading “find,” and, after testing of the first prototype, “journal title list” was changed to “list of journals and magazines.” in the following tests 66 percent of the participants were able to answer correctly. the percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. this can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. in the prototype of the site there were several paths as well, some direct, some indirect. testing the wording of this link supported the understandable terminology principle, more so than the old website’s “library policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. therefore, after the test was completed and the website was launched, we reworded the link to “checkout policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. the remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. participant feedback: qualitative the usability testing process informed the redesign of our website in many specific ways. if the layout of the site didn’t test well with participants, we planned to create another prototype. in their evaluation of colorado state universities libraries’ digital collections and the western waters digital library websites, zimmerman and paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 modeling a library website redesign process | becker 15 when given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “there were no good library links at the bottom before and there wasn’t the ask a librarian link either which i like a lot.” • “the old site was too difficult to navigate, new site has a lot of information, i like the different color schemes for the different things.” • “it is contemporary and has everything i need in front of me.” • “cool.” • “helpful.” • “straightforward.” • “the organization is easier for when you want to find things.” • “interactivity and rollovers make it easy to use.” • “intuitive, straight-forward and i like the simplicity of the colors.” • “more professional, more aesthetically pleasing than the old site.” • “the four menu options (about, find, services, help) break the information down easily.” additional research conducted by nathan, yeow, and murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decisionmaking and affects the usability of the website.33 not only that, but users feel better when using a more attractive product. fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. other changes made to the libraries’ website because of usability testing participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “contact us” link. participants did not realize that the “about,” “find,” “services,” and “help” headings were also links, so we modified them so they were underlined when hovered over. there were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. participants also commented that they wanted links to various public libraries in new york city under the “quick links” section of the homepage. we designed buttons for brooklyn public library, queens public library, and the new york public library and reordered this list to move these links closer to the top of the “quick links” section. information technology and libraries | march 2013 16 conclusion conducting a usability study of hunter college libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. this strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. the one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. references 1. “the digital information seeker: report of the findings from selected oclc, rin, and jisc user behaviour projects,” oclc research, ed. lynn silipigni-connaway and timothy dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. judith emde, lea currie, frances a. devlin, and kathryn graves, “is ‘good enough’ ok? undergraduate search behavior in google and in a library database,” university of kansas scholarworks (2008), http://hdl.handle.net/1808/3869; julia gross and lutie sheridan, “web scale discovery: the user experience,” new library world 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. ibid, 238. 4. jakob nielsen, designing web usability (indianapolis: new riders, 1999), 198. 5. ibid, 134. 6. ibid, 97. 7. barbara j. cockrell and elaine a. jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (2002): 123, doi: 10.1016/s00991333(02)00279-3. 8. steve krug, don't make me think! a common sense approach to web usability, 2nd ed. (berkeley, ca: new riders, 2006), 135. 9. tom ipri, michael yunkin, and jeanne brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/s0099-1333(01)00180-x. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/s0099-1333(01)00180-x doi:%2010.1016/s0099-1333(01)00180-x modeling a library website redesign process | becker 17 11. ibid. 12. jakob nielsen and marie tahir, “keep your users in mind,” internet world 6, no. 24 (2000): 44. 13. steve krug, don't make me think! a common sense approach to web usability, 135. 14. research-based web design and usability guidelines, ed. ben schneiderman (washington: united states dept. of health and human services, 2006), 190. 15. laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. battleson, booth, and weintrop, “usability testing of an academic library web site,” 190. 17. ibid. 18. carole a. george, “usability testing and design of a library website: an iterative approach,” oclc systems & services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. laura cobus, valeda dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 234–35. 20. susan mcmullen, “usability testing in a library web site redesign project,” reference services review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. ibid. 22. john bauer-graham, jodi poe, and kimberly weatherford, “functional by design: a comparative study to determine the usability and functionality of one library's web site,” technical services quarterly 21, no. 2 (2003): 34, doi: 10.1300/j124v21n02_03. 23. cockrell and jayne, “how do i find an article?,” 123. 24. maaike van den haak, menno de jong, and peter jan schellens, “retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue,” behavior & information technology 22, no. 5 (2003): 339. 25. battleson, booth, and weintrop, “usability testing of an academic library web site,” 192. 26. dominique turnbowet al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. cobus, dent, and ondrusek, “how twenty-eight users helped redesign an academic library web site,” 234. 28. krug, don't make me think! 59. 29. nielsen, designing web usability, 164. 30. ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/j124v21n02_03 doi:%2010.1108/10650750510612416 information technology and libraries | march 2013 18 31. schneiderman, research-based web design and usability guidelines, 160. 32. don zimmerman and dawn bastian paschal, “an exploratory evaluation of colorado state universities libraries’ digital collections and the western waters digital library web sites,” journal of academic librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. robert j. nathan, paul h. p. yeow, and sam murugesan, “key usability factors of serviceoriented web sites for students: an empirical study,” online information review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 modeling a library website redesign process | becker 19 appendix a. hunter college libraries’ old website information technology and libraries | march 2013 20 appendix b. hunter college libraries’ new website modeling a library website redesign process | becker 21 appendix c. test participant profiles participant sex academic standing major library instruction session? how often in the library 1 female senior history yes every day 2 female sophomore psychology no every day 3 male junior nursing no 1/week 4 female junior studio art no 5/week 5 female senior accounting yes 2–3/week 6 male freshman undeclared yes 1/week 7 female freshman undeclared no every day 8 male senior music yes 3–4/week 9 male freshman physics/english no every day 10 female senior english lit/ media studies no 1/week 11 female junior fine arts/ geography yes 2–3/week 12 male sophomore computer science yes every day 13 male sophomore econ/psychology yes 6 hours/week 14 female senior math/econ yes 2–3/week 15 female senior art yes everyday 16 male n/a* pre-nursing no daily 17 female senior** econ didn’t remember 3/week 18 male senior pre-med yes 2/week 19 female grad art history yes 3/week 20 male grad education (tesol) no every day note: *this student at hunter fulfilling pre-requisites; already had bachelor of arts degree from another college. **this student had just graduated. information technology and libraries | march 2013 22 appendix d. test questions/tasks • what is the first thing you noticed (or looked at) when you launched the hunter libraries homepage? • what’s the second? • if your instructor assigned the book to kill a mockingbird what link would you click on to see if the library owns that book? • when does the library close on wednesday night? • if you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • where would you click if you needed to find two journal articles on “homelessness in america”? • you have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. where would you click? • does hunter library subscribe to the e-journal journal of communication? • how long can you check out a book for? • how would you find items on reserve for professor doyle’s liibr100 class? • does hunter library have the latest issue of rolling stone magazine? • what is the e-mail for louise sherby, dean of libraries? • what is the phone number for the social work library? • you are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • your friend is a hunter student who lives near brooklyn college. she says that she may return books she borrowed from the brooklyn college library to hunter library. is she right? where would you find out? • this website is easy to navigate (agree, agree somewhat, disagree somewhat, disagree)? • this website uses too much jargon (agree, agree somewhat, disagree somewhat, disagree)? • i use the hunter library’s website (agree, agree somewhat, disagree somewhat, disagree)? batch ingesting into eprints digital repository sof tware tomasz neugebauer and bin han information technology and libraries | march 2012 113 abstract this paper describes the batch importing strategy and workflow used for the import of theses metadata and pdf documents into the eprints digital repository software. a two-step strategy of importing metadata in marc format followed by attachment of pdf documents is described in detail, including perl source code for scripts used. the processes described were used in the ingestion of 6,000 theses metadata and pdfs into an eprints institutional repository. introduction tutorials have been published about batch ingestion of proquest metadata and electronic theses and dissertations (etds),1 as well as endnote library,2 into the digital commons platform. the procedures for bulk importing of etds using dspace have also been reported.3 however, bulk importing into the eprints digital repository software has not been exhaustively addressed in the literature.4 a recent article by walsh provides a literature review of batch importing into institutional repositories.5 the only published report on batch importing into the eprints platform describes perl scripts for metadata-only records import from thomson reuters reference manager.6 bulk importing is often one of the first tasks after launching a repository, so it is unsurprising that requests for reports and documentation on eprints-specific workflow have been a recurring question on the eprints tech list.7 a recently published review of eprints identifies “the absence of a bulk uploading feature” as its most significant weakness.8 although eprints’ graphical user interface for bulk importing is limited to the use of the installed import plugins, the software does have a versatile infrastructure for this purpose. leveraging eprints’ import functionality requires some perl scripting, structuring the data for import, and using the command line interface. in 2009, when concordia university launched spectrum,9 its research repository, the first task was a batch ingest of approximately 6,000 theses dated from 1967 to 2003. the source of the metadata for this import consisted in marc records from an integrated library system powered by innovative interfaces and proquest pdf documents. this paper is a report on the strategy and workflow adopted for batch ingestion of this content into the eprints digital repository software. import strategy eprints has a documented import command line utility located in the /bin folder.10 documents can also be imported through eprints’ graphical interface. using the command line utility for tomasz neugebauer (tomasz.neugebauer@concordia.ca) is digital projects and systems development librarian and bin han (bin.han@concordia.ca) is digital repository developer, concordia university libraries, montreal, quebec, canada. mailto:tomasz.neugebauer@concordia.ca mailto:bin.han@concordia.ca batch ingesting into eprints digital repository software| neugebauer and han 114 importing is recommended because it is easier to monitor the operation in real time by adding progress information output to the import plugin code. the task of batch importing can be split into the following subtasks: 1. import of metadata of each item 2. import of associated documents, such as full-text pdf files the strategy adopted was to first import the metadata for all of the new items into the inbox of an editor’s account. after this first step was completed, a script was used to loop through the newly imported eprints and attach the corresponding full-text documents. although documents can be imported from the local file system or via http, import of the files from the local file system was used. the batch import procedure varies depending on the format of the metadata and documents to be imported. metadata import requires a mapping of the source schema fields to the default or custom fields in eprints. the source metadata must also be converted into one of the formats supported by eprints’ import plugins, or a custom plugin must be created. import plugins are available for many popular formats, including bibtex, doi, endnote, and pubmedxml. in addition, community-contributed import plugins such as marc and arxiv are available at eprints files.11 since most repositories use custom metadata fields, some customization of the import plugins is usually necessary. marc plugin for eprints in eprints, the import and export plugins ensure interoperability of the repository with other systems. import plugins read metadata from one schema and load it into the eprints system through a mapping of the fields into the eprints schema. loading marc-encoded files into eprints requires the installation of the import/export plugin developed by romero and miguel.12 the installation of this plugin requires the following two cpan modules: marc::record and marc::file::usmarc. the marc plugin was then subclassed to create an import plugin named “concordia theses,” which is customized for thesis marc records. concordia theses marc plugin the marc plugin features a central configuration file (see appendix a) in which each marc field is paired with a corresponding mapping to an eprints field. most of the fields were configured through this configuration file (see table 1). the source marc records from the innovative interfaces integrated library system (ils) encode the physical description of each item using the anglo american cataloguing rules, as in the following example: “ix, 133 leaves : ill. ; 29 cm.” since the default eprints field for number of pages is of the type integer and does not allow multipart physical descriptions from the marc 300 field, a custom text field for these physical descriptions (pages_aacr) had to be added. the marc.pl configuration file cannot be used to map compound fields, such as author names—the fields need custom mapping implementation in perl. for instance, the marc 100 and 700 fields information technology and libraries | march 2012 115 are transferred into the eprints author compound field (in marc.pm). similarly, marc 599 is mapped into a custom thesis advisor compound field. marc field eprints field 020a isbn 020z isbn 022a issn 245a title 250a edition 260a place_of_pub 260b publisher 260c date 300a pages_aacr 362a volume 440a series 440c volume 440x issn 520a abstract 730a publication table 1. mapping table from marc to eprints helge knüttel’s refinements to the marc plugin shared on the eprints tech list were employed in the implementation of a new subclass of marc import for the concordia theses marc records. in the implementation of the concordia theses plugin, concordiatheses.pm inherits from marc.pm. (see figure 1.)13 knüttel added two methods that make it easier to subclass the general marc plugin and add unique mappings: handle_marc_specialities and post_process_eprint. the post_process_eprint function was not used to attach the full-text documents to each eprint. instead, the strategy to import the full-text documents using a separate attach_documents script was used (see “theses document file attachment” below). import of all of the specialized fields, such as thesis type (mapped from marc 710t), program, department, and proquest id, was implemented in the function handle_marc_specialities of concordiatheses.pm. for instance, 502a in the marc record contains the department information, whereas an eprints system like spectrum stores department hierarchy as subject objects in a tree. therefore importing the department information based on the value of 502a required regular expression searches of this marc field to find the mapping into a corresponding subject id. this was implemented in the handle_marc_specialities function. batch ingesting into eprints digital repository software| neugebauer and han 116 figure 1. concordia theses class diagram, created with the perl module uml::class::simple execution of the theses metadata import the depositing user’s name is displayed along with the metadata for each eprint. a batchimporter user with the corporate name “concordia university libraries” was created to carry out the import. as a result, the public display of the imported items shows the following as a part of the metadata: “deposited by: concordia university libraries.” the marc plugin requires the encoding of the source marc files to be utf-8, whereas the records are exported from the ils with marc-8 encoding. therefore marcedit software developed by reese was used to convert the marc file to utf-8.14 to activate the import, the main marc import plugin and its subclass, concordiatheses.pm, have to be placed in the plugin folder /perl_lib/eprints/plugin/import/marc/. the configuration file information technology and libraries | march 2012 117 (see appendix a) must also be placed with the rest of the configurable files in /archives/repositoryid/cfg/cfg.d. the plugin can then be activated from the command line using the import script in the /bin folder. a detailed description of this script and its usage is documented on the eprints wiki. the following eprints command from the /bin folder was used to launch the import: import repositoryid --verbose --user batchimporter eprint marc::concordiatheses theses-utf8.mrc following the aforementioned steps, all the theses metadata was imported into the eprints software. the new items were imported with their statuses set to inbox. a status set to inbox means that the imported items are in the work area of batchimporter user and will need to be moved to live public access by switching their status to archive. theses document file attachment after the process of importing the metadata of each thesis is complete, the corresponding document files need to be attached. the proquest id was used to link the full-text pdf documents to the metadata records. all of the marc records contained the proquest id, while the pdf files, received from proquest, were delivered with the corresponding proquest id as the filename. the pdfs were uploaded to a folder on the repository web server using ftp. the attach_documents script (see appendix b for source code) was then used to attach the documents to each of the imported eprints in the batchimporter’s inbox and to move the imported eprints to the live archive. several variables need to be set at the beginning of the attach_documents operation (see table 2). variable comment $root_dir = 'bin/importdata/proquest' this is the root folder where all the associated documents are uploaded by ftp. $depositor = 'batchimporter' only the items deposited by a defined depositor, in this case batchimporter, will be moved from inbox to live archive. $dataset_id = 'inbox' limit the dataset to those eprints with status set to inbox $repositoryid = 'library' the internal eprints identifier of the repository table 2. variables to be set in the attach_documents script batch ingesting into eprints digital repository software| neugebauer and han 118 the following command is used to proceed with file attachment, while the output log is redirected and saved in the file attachment: /bin/attach_documents.pl > ./attachment 2>&1 the thesis metadata record was made live even if it did not contain a corresponding document file. a list of eprint ids of theses that did not contain a corresponding full-text pdf document are listed at the end of the log file, along with the count of the number of theses that were made live. after the import operation is complete, all the abstract pages need to be regenerated with the following command: /bin/generate_abstracts repositoryid conclusions this paper is a detailed report on batch importing into the eprints system. the authors believe that this paper and its accompanying source code is a useful contribution to the literature on batch importing into digital repository systems. in particular, it should be useful to institutions that are adopting the eprints digital repository software. batch importing of content is a basic and fundamental function of a repository system, which is why the topic has come up repeatedly on the eprints tech list and in a repository software review. the methods that we describe for carrying out batch importing in eprints make use of the command line and require perl scripting. more robust administrative graphical user interface support for batch import functions would be a useful feature to develop in the platform. acknowledgements the authors would like thank mia massicotte for exporting the metadata records from the integrated library system. we would also like to thank alexandros nitsiou, raquel horlick, adam field, and the reviewers at information technology and libraries for their useful comments and suggestions. references 1. shawn averkamp and joanna lee, “repurposing proquest metadata for batch ingesting etds into an institutional repository,” code{4}lib journal 7 (2009), http://journal.code4lib.org/articles/1647 (accessed june 27, 2011). 2. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs.lib.purdue.edu/lib_research/96/ (accessed june 20, 2011). 3. randall floyd, “automated electronic thesis and dissertations ingest,” 2009, https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+i ngest (accessed may 26, 2011). 4. eprints digital repository software, university of southampton, uk, http://www.eprints.org/ (accessed june 27, 2011). 5. maureen p. walsh, “batch loading collections into dspace: using perl scripts for automation and quality control,” information technology & libraries 29, no. 3 (2010): 117–27, http://journal.code4lib.org/articles/1647 http://docs.lib.purdue.edu/lib_research/96/ https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest http://www.eprints.org/ information technology and libraries | march 2012 119 http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live (accessed june 26, 2011). 6. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed june 27, 2011). 7. eprints tech list, university of southampton, uk, http://www.eprints.org/tech.php/ (accessed june 27, 2011). 8. mike beazly, “eprints institutional repository software: a review,” partnership: the canadian journal of library & information practice & research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 (accessed june 27, 2011). 9. concordia university libraries, “spectrum: concordia university research repository,” http://spectrum.library.concordia.ca (accessed june 27, 2011). 10. eprints wiki, “api:bin/import,” university of southampton, uk, http://wiki.eprints.org/w/api:bin/import (accessed june 23, 2011). 11. eprints files, university of southampton, uk, http://files.eprints.org/ (accessed june 24 2011). 12. parella romero and jose miguel, “marc import/export plugins for gnu eprints3,” eprints files, 2008, http://files.eprints.org/323/ (accessed may 31, 2011). 13. agent zhang and maxim zenin, “uml:class::simple,” cpan, http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm (accessed september 20, 2011). 14. terry reese, “marcedit: downloads,” oregon state university, http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html (accessed june 27, 2011). http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live http://hdl.handle.net/1905/175 http://www.eprints.org/tech.php/ http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 http://spectrum.library.concordia.ca/ http://wiki.eprints.org/w/api:bin/import http://files.eprints.org/ http://files.eprints.org/323/ http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html batch ingesting into eprints digital repository software| neugebauer and han 120 appendix a. marc.pl configuration file # # plugin eprints::plugin::import::marc # # marc tofro eprints mappings # do _not_ add compound mappings here. $c->{marc}->{marc2ep} = { # marc to eprints '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '245a' => 'title', '245b' => 'subtitle', '250a' => 'edition', '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{marc2ep}->{constants} = { }; ################################################################### ### # # plugin-specific settings. # # any non empty hash set for a specific plugin will override the # general one above! # ################################################################### ### # # plugin eprints::plugin::import::marc::concordiatheses # $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{marc2ep} = { '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '250a' => 'edition', information technology and libraries | march 2012 121 '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '300a' => 'pages_aacr', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{constants} = { # marc to eprints constants 'type' => 'thesis', 'institution' => 'concordia university', 'date_type' => 'submitted', }; batch ingesting into eprints digital repository software| neugebauer and han 122 appendix b. attach_documents.pl #!/usr/bin/perl -i/opt/eprints3/perl_lib =head1 description this script allows you to attach a file to an eprint object by proquest id. =head1 copyright and license 2009 adam field, tomasz neugebauer 2011 bin han this module is free software under the same terms of perl. compatible with eprints 3.2.4 (victoria sponge). =cut use strict; use warnings; use eprints; my $repositoryid = 'library'; my $root_dir = '/opt/eprints3/bin/import-data/proquest'; #location of pdf files my $dataset_id = 'inbox'; #change to 'eprint' if you want to run it over everything. my $depositor = 'batchimporter'; #limit import to $depositor’s inbox #global variables for log purposes my $int_live = 0; #count of eprints moved to live archive with a document my $int_doc = 0; #count of eprints that already have document attached my @array_doc; #ids of eprints that already have documents my $int_no_doc = 0; #count of eprints moved to live with no document attached my @array_no_doc; #ids of eprints that have no documents my $int_no_proid = 0; #count of eprints with no proquest id my @array_no_proid; #ids of eprints with no proquest id my $session = eprints::session->new(1, $repositoryid); die "couldn't create session for $repositoryid\n" unless defined $session; #the hash contains all the files that need to be uploaded #the hash contains key-value pairs: (pq_id => filename) my $filemap = {}; load_filemap($root_dir); #get all eprints in inbox dataset my $dataset = $session->get_repository->get_dataset($dataset_id); #run attach_file on each eprint object $dataset->map($session, \&attach_file); information technology and libraries | march 2012 123 #output log for attachment print "#### $int_doc eprints already have document attached, skip ####\n @array_doc\n"; print "#### $int_no_proid eprints doesn't have proquest id, skip ####\n @array_no_proid\n"; print "#### $int_no_doc eprints doesn't have associated document, moved to live ####\n @array_no_doc\n"; #total number of eprints that were made live: those with and without documents. my $int_total_live = $int_live + $int_no_doc; print "#### intotal: $int_total_live eprints moved to live ####\n"; #attach file to corresponding eprint object sub attach_file { my ($session, $ds, $eprint) = @_; #skip if eprint already has a document attached my $full_text_status = $eprint->get_value( "full_text_status" ); if ($full_text_status ne "none") { print "eprint ".$eprint->get_id." already has a document, skipping\n"; $int_doc ++; push ( @array_doc, $eprint->get_id ); return; } #retrieve username/userid associated with current eprint my $user = new eprints::dataobj::user( $eprint->{ session }, $eprint->get_value( "userid" ) ); my $username; # exit in case of failure to retrieve associated user, just in case. return unless defined $user; $username = $user->get_value( "username" ); # $dataset includes all eprints in inbox, so we limit to $depositor's items only return if( $username ne $depositor ); #skip if no proquest id is associated with the current eprint my $pq_id = $eprint->get_value('pq_id'); if (not defined $pq_id) { print "eprint ".$eprint->get_id." doesn't have a proquest id, skipping\n"; $int_no_proid ++; batch ingesting into eprints digital repository software| neugebauer and han 124 push ( @array_no_proid, $eprint->get_id ); return; } #remove space from proquest id $pq_id =~ s/\s//g; #attach the pdf to eprint objects and move to live archive if ($filemap->{$pq_id} and -e $filemap->{$pq_id} ) #if the file exists { #create document object, add pdf files to document, attach to eprint object, and move to live archive my $doc = eprints::dataobj::document::create( $session, $eprint ); $doc->add_file( $filemap->{$pq_id}, $pq_id . '.pdf' ); $doc->set_value( "format", "application/pdf" ); $doc->commit(); print "adding document to eprint ", $eprint->get_id, "\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive.\n"; $int_live ++; } else { #move the metadata-only eprints to live as well print "proquest id \\$pq_id\\ (eprint ", $eprint->get_id, ") does not have a file associated with it\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive without document attached.\n"; $int_no_doc ++; push ( @array_no_doc, $eprint->get_id ); } } #recursively traverse the directory, find all pdf files. sub load_filemap { my ($directory) = @_; foreach my $filename (<$directory/*>) { if (-d $filename) { load_filemap($filename); } #catch the file name ending in .pdf elsif ($filename =~ m/([^\/]*)\.pdf$/i) information technology and libraries | march 2012 125 { my $pq_id = $1; #add pq_id => filename pair to filemap hash table $filemap->{$pq_id} = $filename; } } } communications | maceli, wiedenbeck, and abels 71benign neglect: developing life rafts for digital content | deridder 71 in accordance with the current best practices.8 for those producers of content who are not able to meet the requirements of ingest, or who do not have access to an oais archive provider, what are the options? with the recent downturn in the economy, the availability of staff and the funding for the support of digital libraries has no doubt left many collections at risk of abandonment. is there a method for preparation of content for longterm storage that is within the reach of existing staff with few technical skills? if the content cannot get to the safe harbor of a trusted digital library, is it consigned to extinction? or are there steps we can take to mitigate the potential loss? the oais model incorporates six functional entities: ingest, data management, administration, preservation planning, archival storage, and access.9 of these six, only archival storage is primary; all the others are useless without the actual content. and if the content cannot be accessed in some form, the storage of it may also be useless. therefore the minimal components that must be met are those of archival storage and some form of access. the lowest cost and simplest option for archival storage currently available is the distribution of multiple copies dispersed across a geographical area, preferably on different platforms, as recommended by the current lockss initiative,10 which focuses on bit-level preservation.11 private lockss network models (such as the alabama digital preservation network)12 are the lowest-cost implementation, requiring only hardware, membership in lockss, and a small amount of time and technical expertise. reduction of the six functional entities to only two negates the need in contrast, other leaders of the digital preservation movement have been stating for years that benign neglect is not a workable solution for digital materials. eric van de velde, director of caltech’s library information technology group, stated that the “digital archive must be actively managed.”3 tom cramer of stanford university agrees: “benign neglect doesn’t work for digital objects. preservation requires active, managed care.”4 the digital preservation europe website argues that benign neglect of digital content “is almost a guarantee that it will be inaccessible in the future.”5 abby smith goes so far as to say that “neglect of digital data is a death sentence.”6 arguments to support this statement are primarily those of media or data carrier storage fragility and obsolescence of hardware, software, and format. however, the impact of these arguments can be reduced to a manageable nightmare. by removing as much as possible of the intermediate systems, storing open-source code for the software and operating system needed for access to the digitized content, and locating archival content directly on the file system itself, we reduce the problems to primarily that of format obsolescence. this approach will enable us to forge ahead in the face of our lack of resources and our rather desperate need for rapid, cheap, and pragmatic solutions. current long-term preservation archives operating within the open archival information system (oais) model assume that producers can meet the requirements of ingest.7 however, the amount of content that needs to be deposited into archives and the expanding variety of formats and genres that are unsupported, are overwhelming the ability of depositors to prepare content for preservation. andrea goethals of harvard proposed that we revisit assumptions of producer ability to prepare content for deposit benign neglect: developing life rafts for digital content i n his keynote speech at the archiving 2009 conference in arlington, virginia, clifford lynch called for the development of a benign neglect model for digital preservation, one in which as much content as possible is stored in whatever manner available in hopes of there someday being enough resources to more properly preserve it. this is an acknowledgment of current resource limitations relative to the burgeoning quantities of digital content that need to be preserved. we need low cost, scalable methods to store and preserve materials. over the past few years, a tremendous amount of time and energy has, sensibly, been devoted to developing standards and methods for best practices. however, a short survey of some of the leading efforts clarifies for even the casual observer that implementation of the proposed standards is beyond many of those who are creating or hosting digital content, particularly because of restrictions on acceptable formats, requirements for extensive metadata in specific xml encodings, need for programmers for implementation, costs for participation, or simply a lack of a clear set of steps for the uninitiated to follow (examples include: planets, premis, dcc, caspar, irods, sound directions, hathitrust).1 the deluge of digital content coupled with the lack of funding for digital preservation and exacerbated by the expanding variety of formats, makes the application of extensive standards and extraordinary techniques beyond the reach of the majority. given the current circumstances, lynch says, either we can seek perfection and store very little, or we can be sloppy and preserve more, discarding what is simply intractable.2 communications jody l. deridder (jlderidder@ua.edu) is head, digital services, university of alabama. jody l. deridder 72 information technology and libraries | june 2011 during digitization is that developing digital libraries usually have a highly chaotic disorganization of files, directory structures, and metadata that impede digital preservation readiness.19 if the archival digital files cannot be easily and readily associated with the metadata that provides their context, and if the files themselves are not organized in a fashion that makes their relationships transparent, reconstruction of delivery at some future point is seriously in question. underfunded cultural heritage institutions need clear specifications for file organization and preparation that they are capable of meeting without programming staff or extensive time commitments. particularly in the current economic downturn, few institutions have the technical skills to create mets wrappers to clarify file relationships.20 one potential solution is to use the organization of files in the file system itself to communicate clearly to future archivists how the files relate to one another. at the university of alabama, we have adopted a standardized file naming system that organizes content by the holding institution and type, collection, item, and then sequence of delivery (see figure 1). the file names are echoed in the file system: top level directories match the holding institution number sequence, secondary level directory names match the assigned collection number sequence, and so forth. metadata and documentation are stored at whatever level in the file system corresponds to the files to which they apply, and these text and xml files have file names that also correspond to the files to which they apply, which assists further in identification (see figure 2).21 by both naming and ordering the files according to the same system, and bypassing the need for databases, complex metadata schemes and software, we leverage the simplicity of the file system to bring order to chaos and to enable our content to be easily reconstructed by future systems. take and manage the content is still uncertain. the relay principle states that a preservation system should support its own migration. preserving any type of digital information requires preserving the information’s context so that it can be interpreted correctly. this seems to indicate that both the intellectual context and the logical context need to be provided. context may include provenance information to verify authenticity, integrity, and interpretation;17 it may include structural information about the organization of the digital files and how they relate to one another; and it should certainly include documentation about why this content is important, for whom, and how it may be used (including access restrictions). because the cost of continued migration of content is very high, a method of mitigating that cost is to allow content to become obsolete but to support sufficient metadata and contextual information to be able to resurrect full access and use at some future time—the resurrection principle. to be able to resurrect obsolete materials, it would be advisable to store the content with open-source software that can render it, an opensource operating system that can support the software, and separate plain-text instructions for how to reconstruct delivery. in addition, underlying assumptions of the storage device itself need to be made explicit if possible (type of file system partition, supported length of file names, character encodings, inode information locations, etc.). some of the need for this form of preservation may be diminished through such efforts as the planets timecapsule deposit.18 this consortium has gathered the supporting software and information necessary to access current common types of digital files (such as pdf), for long-term storage in swiss fort knox. one of the drawbacks to gathering and storing content developed for a tremendous amount of metadata collection. where the focus has been on what is the best metadata to collect, the question becomes: what is the minimal metadata and contextual information needed? the following is an attempt to begin this conversation in the hope that debate will clarify and distill the absolutely necessary and specific requirements to enable long-term access with the lowest possible barrier to implementation. if we consider the purpose of preservation to be solely that of ensuring long-term access, it is possible to selectively identify information for inclusion. the recent proposal by the researchers of the national geospatial digital archive (ngda) may help to direct our focus. they have defined three architectural design principles that are necessary to preserve content over time: the fallback principle, the relay principle, and the resurrection principle.13 in the event that the system itself is no longer functional, then a preservation system should support some form of hand-off of its content—the fallback principle. this can be met by involvement in lockss, as specified above. lacking the ability to support even this, current creators and hosts of digital content may be at the mercy of political or private support for ingest into trusted digital repositories.14 the recently developed bagit file package format includes valuable information to ensure uncorrupted transfer for incorporation into such an archive.15 each base directory containing digital files is considered a bag, and the contents can be any types of files in any organization or naming convention; the software tags the content (or payload) with checksums and manifest, and bundles it into a single archive file for transfer and storage. an easily usable tool to create these manifests has already been developed to assist underfunded cultural heritage organizations in preparing content for a hosting institution or government infrastructure willing to preserve the content.16 the gap of who would communications | deridder 73benign neglect: developing life rafts for digital content | deridder 73 clifford lynch pointed out, funding cutbacks at the sub-federal level are destroying access and preservation of government records; corporate records are winding up in the trash; news is lost daily; and personal and cultural heritage materials are disappearing as we speak.24 it is valuable and necessary to determine best practices and to seek to employ them to retain as much of the cultural and historical record as possible, and in an ideal world, these practices would be applied to all valuable digital content. but in the practical and largely resource-constrained world of most libraries and other cultural institutions, this is not feasible. the scale of content creation, the variety and geographic dispersal of materials, and the cost of preparation and support makes it impossible for this level of attention to be applied to the bulk of what must be saved. for our cultural memory from this period to survive, we need to communicate simple, clear, scalable, inexpensive options to digital holders and creators. references 1. planets consortium, planets preservation and long-term access through networked services, http:// www.planets-project.eu/ (accessed mar. 29, 2011); library of congress, premis (preservation metadata maintenance activity), http://www.loc.gov/standards/premis/ (accessed mar. 29, 2011); dcc (digital curation centre), http:// www.dcc.ac.uk/ (accessed mar. 29, 2011); caspar (cultural, artistic, and scientific knowledge for preservation, access, and retrieval), http://www.casparpreserves .eu/ (accessed mar. 29, 2011); irods (integrated rule-oriented data system), h t t p s : / / w w w. i ro d s . o rg / i n d e x . p h p / irods:data_grids,_digital_libraries,_ persistent_archives,_and_real-time_ data_systems (accessed mar. 29, 2011); mike casey and bruce gordon, sound directions: best practices for audio preservation, http://www.dlib.indiana .edu/projects/sounddirections/papers present/sd_bp_07.pdf (accessed june 14, 2010); hathitrust: a shared digital online delivery of cached derivatives and metadata, as well as webcrawlerenabled content to expand accessibility. this model of online delivery will enable low cost, scalable development of digital libraries by simply ordering content within the archival storage location. providing simple, clear, accessible methods of preparing content for preservation, of duplicating archival treasures in lockss, and of web accessibility without excessive cost or deep web database storage of content, will enable underfunded cultural heritage institutions to help ensure that their content will continue to survive the current preservation challenges. as david seaman pointed out, the more a digital item is used, the more it is copied and handled, the more it will be preserved.23 focusing on archival storage (via lockss) and accessibility of content fulfills the two most primary oais functional capabilities and provides a life raft option for those who are not currently able to surmount the forbidding tsunami of requirements being drafted as best practices for preservation. the importance of offering feasible options for the continued support of the long tail of digitized content cannot be overstated. while the heavily funded centers may be able to preserve much of the content under their purview, this is only a small fraction of the valuable digitized material currently facing dissolution in the black hole of our cultural memory. as while no programmers are needed to organize content into such a clear, consistent, and standardized order, we are developing scripts that will assist others who seek to follow this path. these scripts not only order the content, they also create lockss manifests at each level of the content, down to the collection level, so that the archived material is ready for lockss pickup. a standardized lockss plugin for this method is available. to assist in providing access without a storage database, we are also developing an open-source web delivery system (acumen),22 which dynamically collects content from this protected archival storage arrangement (or from webaccessible directories) and provides figure 1. university of alabama libraries digital file naming scheme (©2009. used with permission.) figure 2. university of alabama libraries metadata organization (©2009. used with permission.) 74 information technology and libraries | june 2011 .org/documents/domain-range/index .shtml#provenancestatement (accessed july 18, 2009). 18. planets consortium, planets time capsule—a showcase for digital preservation, http://www.ifs.tuwien .ac.at/dp/timecapsule/ (accessed june 14, 2010). 19. martin halbert, katherine skinner, and gail mcmillan, “avoiding the calfpath: digital reservation readiness for growing collections and distributed preservation networks,” archiving 2009 (may 2009): 6. 20. library of congress, metadata encoding and transmission standard (mets), http://www.loc.gov/standards/ mets. 21. jody l. deridder, “from confusion and chaos to clarity and hope,” in digitization in the real world: lessons learned from small to medium-sized digitization projects, ed. kwong bor ng and jason kucsma, (metropolitan new york library council, n.y., 2010). 22. tonio loewald and jody deridder, “metadata in, library out. a simple, robust digital library system,” code4lib journal 10 (2010), http://journal.code4lib .org/articles/3107 (accessed aug. 29, 2010). 23. david seaman “the dlf today” (keynote presentation, 2004 symposium on open access and digital preservation, atlanta, ga.), paraphrased by eric lease morgan in musings on information and librarianship, http://infomotions.com/ m u s i n g s / o p e n a c c e s s s y m p o s i u m / (accessed aug. 9, 2009). 24. lynch, challenges and opportunities. 9. consultative committee for space data systems, reference model. 10. stanford university et al., lots of copies keep stuff safe (lockss), http:// www.lockss.org/lockss/home (accessed mar. 29, 2011). 11. david s. rosenthal et al., “requirements for digital preservation systems: a bottom-up approach,” d-lib magazine 11 (nov. 2005): 11, http:// w w w. d l i b . o r g / d l i b / n o v e m b e r 0 5 / rosenthal/11rosenthal.html (accessed june 14, 2010). 12. alabama digital preservation network (adpnet), http://www.adpn .org/ (accessed mar. 29, 2011). 13. greg janée, “preserving geospatial data: the national geospatial digital archive’s approach,” archiving 2009 (may 2009): 6. 14. research libraries group/oclc, trusted digital repositories: attributes and responsibilities, http://www .oclc.org/programs/ourwork/past/ trustedrep/repositories.pdf (accessed july 17, 2009). 15. andy boyko et al., the bagit file packaging format (0.96) (ndiipp content transfer project), http://www.digital preservation.gov/library/resources/ tools/docs/bagitspec.pdf (accessed july 18, 2009). 16. library of congress, bagit library, http://www.digitalpreservation.gov/ partners/resources/tools/index.html#b (accessed june 14, 2010). 17. andy powell, pete johnston, and thomas baker, “domains and ranges for dcmi properties: definition of the dcmi term provenance,” http://dublincore repository, http://www.hathitrust.org/ (accessed mar. 29, 2011). 2. clifford lynch, challenges and opportunities for digital stewardship in the era of hope and crisis (keynote speech, is&t archiving 2009 conference, arlington, va., may 2009). 3. jane deitrich, e-journals: do-ityourself publishing, http://eands .caltech.edu/articles/e%20journals/ ejournals5.html (accessed aug. 9, 2009). 4. tom cramer, quoted in art pasquinelli, “digital libraries and repositories: issues and trends” (sun microsystems presentation at the summit bibliotheken, universitäsbibliothek kassel, 18–19, mar. 2009), slide 12, http:// de.sun.com/sunnews/events/2009/ bibsummit/pdf/2-art-pasquinelli.pdf (accessed july 12, 2009). 5. digital preservation europe, what is digital preservation? http://www.digi talpreservationeurope.eu/what-is-digi tal-preservation/ (accessed june 14, 2010). 6. abby smith, “preservation,” in susan schreibman, ray siemens, john unsworth, eds., a companion to digital humanities (oxford: blackwell, 2004), http://www.digitalhumanities.org/com panion/ (accessed june 14, 2010). 7. consultative committee for space data systems, reference model for an open archival system (oais), ccsds 650.0-b-1 blue book, jan. 2002, http://public.ccsds .org/publications/archive/650x0b1.pdf (accessed june 14, 2010). 8. andrea goethals, “meeting the preservation demand responsibly = lowering the ingest bar?” archiving 2009 (may 2009): 6. 58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” 150 information technology and libraries | december 2011 hardly a day goes by in my professional life (and it sometimes creeps into my personal life too!) when i don’t think about the issues of connecting people with data, and then how to present that data in ways that are relevant to their needs. the tides are shifting in health sciences library and likely in your library too. ongoing changes in publishing and the changing nature of research have challenged the traditional nature of the library. it is no longer solely a repository for information, physical or virtual. as librarians move from collecting and cataloging bibliographic information new roles have emerged in data discovery, in its preservation, and in helping to make data more accessible. important specialties include; knowledge management, data visualization, e-science and copyright. librarians have valuable skills sets in mining and accessing data, human–computer interaction, computer interface design, and knowledge management that can be leveraged now. it is inevitable that data discovery will quicken the pace of science and lead to collaboration and collaboration will in turn lead to data discovery and accelerate the pace of science and so on and so on. in short twentieth century data stored in individual scientists’ notebooks or computers is largely inaccessible. twenty-first-century data needs to be available 24/7 in a curated state for continuous analysis. information overload and data deluge created by intersection of science and technology are two very real problems that the librarians have the skill and ability to deal with. and, as i talk of science, bear in mind that it extends beyond the biological and physical sciences to encompass the social sciences as well. interdisciplinary studies in particular have intensive data needs. in fields such as public health and urban planning, government data alongside research data is used to predict trends, forecast, make decisions, etc. government data is a particularly important part of the equation. consider the recent nsf requirement for researchers to provide open access to their data for any nsf-sponsored grants. it is likely other government agencies will follow suit. one of taiga’s provocative statements of 2011 is “#10. the oversupply of mlss” which states that “within five years, library programs will have overproduced mlss at a rate greater even than humanities phds and glutted a permanently diminished market.”1 as the alarming scenario of an over abundance of new mlss in proportion to available library jobs presents itself, i encourage librarians to begin to envision themselves as digital information brokers or data scientists. the us department of labor in the 2010–11 occupational outlook handbook, anticipates that librarian jobs in nontraditional settings will grow the fastest over this decade. nontraditional libraries and jobs include working as information brokers for private corporations, nonprofit organizations, and consulting firms. “many companies are turning to librarians because of l ast week i attended the second annual vivo conference in washington, d.c. vivo (vivoweb .org) is a semantic web application that enables the discovery of research and scholarship across disciplines in an institution with the potential to also link scholars and research across institutions. despite an earthquake and a hurricane the conference itself was the real showstopper—excellent, informative programming, engaging speakers, great networking and exchange of ideas. my institution is one of the core vivo members so it was an opportunity to showcase our work, see what others are doing as well as learn more about trends in research, e-science and data discovery and collaboration initiatives. much of what i learned or rediscovered at vivo will make it into my fifty-minute presentation on the subject at the lita national forum in st. louis later this month. in fact the vivo conference itself reminded me of our own national forum in size, scope and content. it was a good mix of in-depth technical discussions coupled with broad coverage of issues and trends in scientific research. this attention to content balance is something that lita consistently gets right at our annual forum—there is literally something for everyone from introductory concepts to technical details—and i look forward to seeing many familiar faces and meeting some new folks at this year’s lita national forum in st. louis “rivers of data: currents of change.” i would also like to take this opportunity to personally invite each and every ital reader to the 2012 lita national forum. building on this year’s theme, the 2012 lita national forum will be “the new world of data: discover. connect. remix.” i just signed off on theme this week and i am excited and impressed by the work completed by the national forum planning committee so far. please look for the call for papers and posters to come out in late december. i love the forum because it is much more intimate than the much larger ala meetings i always come away with new ideas and new friends. i am not alone in this feeling. a recent forum attendee commented,” (the lita forum) was one of the best conferences i have attended. i met a far greater concentration of peers—colleagues at other libraries doing similar work—at lita forum than i have met at other similar conferences.” i don’t think i could say it better myself. the 2012 forum theme is one of great personal interest to me and i plan to extend the theme to the lita president’s program on june 24, 2012, in anaheim. in fact colleen cuddypresident’s message: data discovery colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york president’s message | cuddy 151 column a call to arms for librarians of all backgrounds. the time to address data discovery is now! references 1. “taiga 2011 provocative statements,” http://taigaforum provocativestatements.blogspot.com/ (accessed sept. 22, 2011). 2. united states department of labor, bureau of labor statistics, occupational outlook handbook, 2010–11 edition, http:// www.bls.gov/oco/ocos068.htm (accessed sept. 22, 2011). their research and organizational skills and their knowledge of computer databases and library automation systems. librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company’s specific needs.” 2 we have been seeing new job titles emerging to reflect these needs, such as data curation librarian, digital data outreach librarian, gis librarian, etc. what is your library doing with data? how can you and your library address the data needs of the twenty-first century? what technology is needed to address data needs? how can lita help you meet those needs? consider this statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2010 issue). total number of copies printed: average, 4,547; actual, 4,494. mailed outside country paid subscriptions: average, 3,608; actual, 3,577. sales through dealers and carriers, street vendors, and counter sales: average, 395; actual 367. total paid distribution: average, 4,003; actual, 3,944. free or nominal rate copies mailed at other classes through the usps: average, 27; actual, 27. free distribution outside the mail (total): average, 118; actual, 117. total free or nominal rate distribution: average, 145; actual, 144. total distribution: average, 4,148; actual, 4,088. office use, leftover, unaccounted, spoiled after printing: average, 399; actual, 406. total: average, 4,547; actual, 4,494. percentage paid: average, 96.50; actual, 96.48. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 11 . reference information extraction and processing using conditional random fields tudor groza, gunnar aastrand grimnes, and siegfried handschuh reference information extraction and processing |groza, grimnes, and handschuh 6 abstract fostering both the creation and the linking of data with the scope of supporting the growth of the linked data web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. this is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. in addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. in this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. the experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing. 1. introduction the progressive adoption of semantic web 1 techniques resulted in the creation of a series of datasets connected by the linked data 2 initiative, and via the linked data principles, into a universal web of linked data. in order to foster the continuous growth of this linked data web, we need to improve the acquisition and extraction mechanisms of the underlying semantic metadata. unfortunately, the scientific publishing domain, a domain with an enormous potential for generating large amounts of linked data, still promotes trivial mechanisms for producing semantic metadata. 3 as an illustration, the metadata acquisition process of the semantic web dog food server, 4 the main linked data publication repository available on the web, consists of two steps:  the authors manually fill-in submission forms corresponding to different publishing venues (e.g., conferences or workshops), with the resulting (usually xml) information being transformed via scripts into semantic metadata, and  the entity uris (i.e., authors and publications) present in this semantic metadata are then manually mapped to existing web uris for linking/consolidation purposes. tudor groza (tudor.groza@uq.edu.au) is postdoctoral research fellow, school of information technology and electrical engineering, university of queensland, gunnar aastrand grimnes (grimnes@dfki.uni-kl.de) is researcher, german research center for artificial intelligence (dfki) gmbh, kaiserslautern, germany, siegfried handschuh (msiegfried.handschuh@deri.org) is senior lecturer/associate professor, national university of ireland, galway, ireland. mailto:tudor.groza@uq.edu.au mailto:grimnes@dfki.uni-kl.de mailto:msiegfried.handschuh@deri.org information technology and libraries | june 2012 7 moreover, independent of the creation/acquisition process, one particular component of the publication metadata, i.e., the reference information, is almost constantly neglected. the reason is mainly the amount of work required to manually create it, or the complexity of the task, in the case of automatic extraction. as a result, currently, there are no datasets in the linked data web exposing reference information, while the number of digital libraries providing search and link functionality over references is rather limited. this is quite a problematic gap if we consider the amount of information provided by references and their foundational support for other application techniques that bring value to researchers and librarians, such as citation analysis and citation metrics, tracking temporal author-topic evolution 5 or co-authorship graph analysis. 6,7 in this paper we focus on the first of the above-mentioned steps, i.e., providing the underlying mechanisms for automatic extraction of reference metadata. we devise a solution that enables extraction and chunking of references using conditional random fields (crf). 8 the resulting metadata can then be easily transformed into semantic metadata adhering to particular schemas via scripts, the added value being the exclusion of the manual author-driven creation step from the process. from the domain perspective, we focus on computer science and health sciences only because these domains have representative datasets that can be used for evaluation and hence enable comparison against similar approaches. however, we believe that our model can be applied also in domains such as digital humanities or social sciences, and we intend, in the near future, to build a corresponding corpus that would allow us to test and adapt (if necessary) our solution to these domains. figure 1. examples of chunked and labeled reference strings reference chunking represents the process of label sequencing a reference string, i.e., tagging the parts of the reference containing the authors, the title, the publication venue, etc. the main issue associated with this task is the lack of uniformity in the reference representation. figure 1 presents three examples of chunked and labeled reference strings. one cannot infer generic patterns for all types of references. for example, the year (or date) of some of the references of this paper are similar to example 2 from the figure, i.e., they are located at the very end of the reference string. unfortunately, this does not hold for some journal reference formats, such as the one presented in example 1. and at the same time, the actual date might not comprise only the year, but also the month (and even day). in addition to the placement of the particular types of tokens within the reference string, one of the major concerns when labeling these types of tokens is disambiguation. generally, there are three categories of ambiguous elements: reference information extraction and processing |groza, grimnes, and handschuh 8  names—can act as authors, editors, or even part of organization names (e.g., max planck institute); in example 1 a name is used as part of the title;  numbers—can act as pages, years, days, volume numbers, or just numbers within the title;  locations—can act as actual locations or part of organization names (e.g., univ. of wisconsin) to help the chunker in performing disambiguation, one can use a series of markers, such as, pp. for pages, tr for technical reports, univ. or institute for organization. however, there are cases where such markers help in detecting the general category of the token, e.g., publication venue, but a more detailed disambiguation is required. for example, the proc. marker generally signals the publication venue of the reference, without knowing exactly whether it represents a workshop, conference or even journal (as in the case of proc. natl. acad. sci.—proceedings of the national academy of sciences). the solution we have devised was built to properly handle such disambiguation issues and the intrinsic heterogeneous nature of references. the features of the crf chunker model were chosen to provide a representative discrimination between the different fields of the reference string. consequently, as the experimental results show, the resulting chunker has a superior efficiency, while at the same time maintaining an increased versatility. the rest of the paper is structured as follows: in section 2 we briefly describe conditional random fields and analyze the existing related work. section 3 details the crf-based reference chunker and before concluding in section 5, section 4 presents our experimental results. 2. background 2.1 conditional random fields to have a better understanding of the machine learning technique used by our solution, in the following we give a brief description of the conditional random fields paradigm. figure 2. example linear crf—showing dependencies between features x and classes y information technology and libraries | june 2012 9 conditional random fields (crf) is a probabilistic graphical model for classification. crf, in general, can represent many different types of graphical models, however in the scope of this paper, we use the so-called linear-chain crfs. a simple example of a linear dependency graph is shown in figure 2, here only the features x of the previous item influences the class of the current item y. the conditional probability is defined as: ( | ) ( ) (∑ ( ) ) where ( ) ∑ ( ) and ( ) ∑ (∑ ( ) ) . the model is usually trained by maximizing the log-likelihood of the training data by gradient methods. a dynamic algorithm is used to compute all the required probabilities p⍬(yi, yi+1) for calculating the gradient of the likelihood. this means that in contrast to traditional classification algorithms in machine learning (e.g., support vector machines 9 ), it not only considers the attributes of the current element when determining the class, but also attributes of preceding and succeeding items. this makes it ideal for tagging sequences, such as chunking of parts of speech or parts of references, which is what we require for our chunking task. 2.2 related work in recent years, extensive research has been performed in the area of automatic metadata extraction from scientific publications. most of the approaches focus on one of the two main metadata components, i.e., on the heading/bibliographic metadata or on the reference metadata, but there are also cases when the entire set is targeted. as this paper focuses only on the second component, within this section we present and discuss those applications that deal strictly with reference chunking. the parscit framework is the closest technique mapping to our goals and methodology. 10 parscit is an open-source reference-parsing package. while its first version used a maximum entropy model to perform reference chunking, 11 currently, inspired by the work of peng et al. , 12 it uses a trained crf model for label sequencing. the model was obtained based on a set of twenty-three token-oriented features tailored towards correcting the errors that peng's crf model produced. our crf chunker builds on the work of parscit. however, as we aimed at improving the chunking performance, we altered some of the existing features and introduced additional ones. moreover, we have compiled significantly larger gazetteers required for detecting different aspects, such as names, places, organizations, journals, or publishers. one of the first attempts to extract and index reference information led to the currently well known system, citeseer. 13 around the same period, seymore et al. developed one of the first reference chunking approaches that used machine learning techniques. 14 the authors trained a hidden markov model (hmm) to build a reference sequence labeler using internal states for different parts of the fields. as it represented pioneering work, it also resulted in the first gold standard set, the cora dataset. at a later stage, the same group applied crf for the first time to perform reference chunking, which later inspired parscit. 15 reference information extraction and processing |groza, grimnes, and handschuh 10 in the same learning-driven category is the work of han et al. 16 the authors proposed an effective word clustering approach with the goal of reducing feature dimensionality when compared to hmm, while at the same time improving the overall chunking performance. the resultant domain, rule-based word clustering method for cluster feature representation used clusters formed from various domain databases and word orthographic properties. consequently, they achieved an 8.5 percent improvement on the overall accuracy of reference fields classification combined with a significant dimensionality reduction. flux-cim 17 is the only unsupervised 18 approach that targets reference chunking. the system uses automatically constructed knowledge bases from an existing set of sample references for recognizing the component fields of a reference. the chunking process features two steps:  a probability estimation of a given term within a reference which is a value for a given reference field based on the information encoded in their knowledge bases, and  the use of generic structural properties of references. similarly to seymore et al., 19 the authors have also created two datasets (specifically for the computer science and health science areas) to be used for comparing the achieved accuracies. a completely different, and novel, direction was developed by poon and domingos. 20 unlike all the other approaches, they propose a solution where the segmentation (chunking) of the reference fields is performed together with the entity resolution in a single integrated inference process. they, thus, help in disambiguating the boundaries of less-clear chunked fields, using the already well-segmented ones. although the results achieved are similar to, and even better than some of, the above-mentioned approaches, this is suboptimal from the computational perspective: the chunking/resolution time reported by the authors measured around thirty minutes. in addition to the previously described works, which were specifically tailored for bibliographic metadata extraction, there are a series of other approaches that could be used for the same purpose. for example, cesario et al. propose an innovative recursive boosting strategy, with progressive classification, to reconcile textual elements to an existing attribute schema. 21 in the case of bibliographic metadata segmentation, the metadata fields would correspond to the textual elements, while an ontology describing them (e.g., dublincore 22 or swrc 23 ) would have the schema role. the authors even describe an evaluation of the method using the dblp citation dataset, however, without giving precise details on the fields considered for segmentation. some other approaches include, in general, any sequence labeling techniques, e.g., slf, 24 named entity recognition techniques, 25 or even field association (fa) terms extraction, 26 the latter working on bibliographic metadata fields in a quasi-similar manner as the recursive boosting strategy. in conclusion, it is worth mentioning that retrieving citation contexts is an interesting research area especially in the context of digital libraries. our current work does not feature this aspect, but we regard it as one of the key next steps to be tackled. consequently, we mention the research performed by schwartz et al. 27 teufel et al., 28 or wu et al. 29 that deal with using citation contexts for discerning a citation's function and analyzing how this influences or is influenced by the work it points to. information technology and libraries | june 2012 11 3. method this section presents the crf chunker model. we start by defining the preprocessing steps that deal with the extraction of the references block, dividing the block into actual reference entries and cleaning the reference strings, and then detail the crf reference chunker features. 3.1 prerequisites most of the features used by the crf chunker require some forms of vocabulary entries. therefore, we have manually compiled a comprehensive list of gazetteers (only for english, except for the names), explained as follows:  firstname—25,155 entries gazetteer of the most common first names (independent of gender);  lastname—48,378 entries list of the most common surnames;  month—month names gazetteer and associated abbreviations;  venuetype—a structured gazetteer with five categories: conference, workshop, journal, techreport, and website. each category has attached its own gazetteer, containing specific keywords and not actual titles. for example, the conference gazetteerfeatures ten unigrams signaling conferences, such as conference, conf, or symposium;  location—places, cities, and countries gazetteer comprising 17,336 entries;  organization—150 entries gazetteer listing organization prefixes and suffixes (e.g., e.v. or kgaa);  proceedings—simple list of all possible appearances of the proceedings marker;  publisher—564 entries gazetteer comprising publisher unigrams (produced from around 150 publisher names);  jtitle—12,101 entries list of journal title unigrams (produced from around 1600 journal titles);  connection—a 42 entries stop-word gazetteer (e.g., to, and, as). 3.2 preprocessing in the preprocessing stage we deal with three aspects:  cleaning the provided input,  extracting the reference block, and  the division of the reference block into reference entries. the first step aims to clean the raw textual input received by the chunker of unwanted spacing characters while at the same time ensuring proper spacing where necessary. since the source of the textual input is unknown to the chunker, we make no assumptions with regard to its structure or content. 30 thus, in order to avoid inherent errors that might appear as a result of extracting the raw text from the original document, we perform the following cleaning steps:  we compress the text by eliminating unnecessary carriage returns, such that the lines containing less than 15 characters are merged with previous ones, 31  we introduce spaces after some punctuation characters, such as “,,” “.” or “-”, and finally,  we split the camel-cased strings, such as johndoe. reference information extraction and processing |groza, grimnes, and handschuh 12 the result will be a compact and clean version of the input. also, if the raw input is already compact and clean, this preprocessing step will not affect it. the extraction of the reference block is done using regular expressions. generally, we search in the compacted and cleaned input for specific markers, like references or bibliography, located mainly at the beginning of a line. if these are not directly found, we try different variations, such as, looking for the markers at the end of a line, or looking for split markers onto two lines (e.g., ref – erences, or refer – ences). this latter case is a typical consequence of the above-described compacting step if the initial input was erroneously extracted. the text following the markers is considered for division, although it may contain unwanted parts such as appendices or tables. the division into individual reference entries is performed on a case basis. after splitting the reference block based on new lines, we look for prefix patterns at the beginning of each line. as an example, we analyze which lines start with “[”, “(”, or a number followed by “.” or space, and we record the positions of these lines in the list of all lines. to ensure that we don't consider any false positives when merging the adjacent lines into a reference entry, we compute a global average of the differences between positions. assuming that a reference does not span on more than four lines, if this average is between one and four, a reference entry is created. the same average is also used to extract the last reference in the list, thus detaching it from eventual appendices or tables. 3.3 the reference chunking model we have built the crf learning model based on a series of features used in principle also by the other crf reference chunking approaches such as parscit 32 or peng and mccallum 33 . a set of feature values is used to characterize each token present in the reference string, where the reference's token list is obtained by dividing the reference string into space-separated pieces. the complete list of features is detailed as follows. we use example 1 from figure 1 toexemplify the feature values.  token—the original reference token: bronzwaer,  clean token—the original token, stripped of any punctuation and lower cased: bronzwaer  token ending—a flag signaling the type of ending (possible values: lower cap – c / upper cap – c / digit – 0 / punctuation character: ,  token decomposition–start—five individual values corresponding to token's first five characters, taken gradually: b, br, bro, bron, bronz  token decomposition–end—five individual values corresponding to the token's last five characters, taken gradually: r, er, aer, waer, zwaer,  pos tag—the token's part of speech tag (possible values: proper noun phrase – nnp ,  noun phrase – np, adjective – jj, cardinal number – cd, etc): nnp  orthographic case—a flag signaling the token's orthographic case (possible values:  initialcap, singlecap, lowercase, mixedcaps, allcaps): singlecap  punctuation type—a flag signaling the presence and type of a trailing punctuation character (possible values: cont, stop, other): cont  number type—a flag signaling the presence and type of a number in the token (possible values: year, ordinal, 1dig, 2dig, 3dig, 4dig, 4dig+, nonumber): nonumber information technology and libraries | june 2012 13  dictionary entries—a set of ten flags signaling the presence of the token in the set of individual gazetteers listed in sect. 3.1. for our example the dictionary feature set would be: no lastname no no no no no no no no  date check—a flag checking whether the token may contain a date in form of a period of days, e.g., 12-14 (possible values: possdate, no): no  pages check—a flag checking whether the token may contain pages, e.g., 234–238 (possible values: posspages, no): no  token placement—the token placement in the reference string, based on its division into nine equal consecutive buckets. this feature indicates the bucket number: 0 for training purposes we compiled and manually tagged a set of 830 randomly chosen references. these were extracted from random publications from diverse conferences and journals from the computer science field (collected from ieee explorer, springer link or the acm portal), manually cleaned, tagged, and categorized according to their type of publication venue. 34 to achieve an increased versatility, instead of performing crossvalidation, 35 which would result in a datasettailored model with limited or no versatility, we opted for sampling the test data. hence, we included in the training corpus some samples from the testing datasets as follows: 10 percent of the cora dataset (i.e., 20 entries), 36 10 percent of the flux-cim cs dataset (i.e., 30 entries), 37 and 1% of the flux-cim hs dataset (i.e., 20 entries). consequently, the final training corpus consisted of a total of 900 reference strings. to clarify, this is, to some extent, similar to the dataset-specific cross-validation, but instead of considering, for example, a 60–40 ratio for training/testing, we used only 10 percent for training, while the testing (described in section 4) was performed as a direct application of the chunker on the entire dataset. as already mentioned, our focus on computer science and health sciences is strictly due to evaluation purposes. our proposed model is domain-agnostic, and hence, the steps described here can be easily performed on datasets emerged from other domains, if at all necessary. in reality, the chunker’s performance on references from a domain not covered above can be easily boosted simply by including a sample of references in the training set and then retraining the chunker. the list of labels used for training and then testing consists of author, title, journal, conference, workshop, website, technicalrep, date, publisher, location, volnum, pages, etal, note, editors, organization. as we will see in the evaluation, not all labels were actually used for testing (e.g., note or editors), some of them being present in the model for the sake of disambiguation. also, as opposed to the other approaches, we made a clear distinction between workshop and conference, which adds an extra degree to the complexity of the disambiguation. the crf model was trained using the mallet (a machine learning for language toolkit) implementation. 38 the output of the chunker is post-processed to expose a series of fine-grained details. as shown in figure 1 in all the examples, the chunking provides a blocked partition of the reference string, but we require for the author field an even deeper partition. consequently, following a rule-based approach we extract the individual author names from the author block making use of the punctuation marks, the orthographic case, and the alternation between initials and actual names. when no initials, subject to the existing punctuation marks, we consider as a rule-of-thumb that each name generally comprises one first name and one surname (in this order, i.e., john doe). the result of the post-processing is used in the linking process. reference information extraction and processing |groza, grimnes, and handschuh 14 4. experimental results we have performed an extensive evaluation of the proposed reference chunking approach. in general, all the previous work in reference chunking focuses on raw reference chunking, i.e., label sequencing at the macro level. more concretely, the other approaches split and tag the reference strings using blocks of complete references, without going into details such as chunking individual authors. the only exception is the parscit package that does perform complete reference chunking in a similar fashion as we do. the evaluation results presented in this section, will feature complete chunking only for our solution and for parscit, and raw chunking for the rest of the approaches. field parscit peng han et al. our approach p r f1 f1 p r f1 p r f1 author 98.7 99.3 98.99 99.4 92.6 99.1 97.6 99.08 99.6 99.30 title 96.0 98.4 97.18 98.3 92.2 93.0 92.6 95.64 95.64 95.64 date 100 98.4 99.19 98.9 98.5 95.9 97.2 99.33 98.67 98.99 pages 97.7 98.4 98.04 98.6 95.6 96.9 96.2 99.28 99.22 99.24 location 95.6 90.0 92.71 87.2 77.7 71.5 74.5 93.45 92.59 93.01 organization 90.9 87.9 89.37 94.0 76.5 77.3 76.9 100 87.87 93.54 journal 90.8 91.2 90.99 91.3 77.1 78.7 77.9 94.02 97.42 95.68 booktitle 92.7 94.2 93.44 93.7 88.7 88.9 88.88 97.77 98.44 98.10 publisher 95.2 88.7 91.83 76.1 56.0 64.1 59.9 94.84 95.83 95.33 tech. rep. 94.0 79.6 86.2 86.7 56.2 64.1 59.9 100 90.90 95.23 website 100 100 100 table 1. evaluation results on the cora dataset an additional observation we need to make is related to the reference fields taken into account. most of the fields we have focused on coincide with the fields considered by all the existing relevant approaches. nevertheless, there are also some discrepancies, listed as follows:  the fields: volume, number, editors, or note were used in the chunking process b u t are not considered for evaluation  unlike all the other approaches, we make the distinction between conference and workshop as publication venues. however, for alignment purposes (i.e., to be able to compare our results with the other approaches), in the evaluation results these are merged into the booktitle field. the actual tests were performed on four different datasets, three of them used also for evaluating the other approaches, and a fourth one compiled by us. in the case of the three existing datasets, during the experimental evaluation we did not make use of the preprocessing step as they were already clean. as evaluation metric, we used the f1 score, 39 i.e., the harmonic mean of precision and recall, using the following formula: information technology and libraries | june 2012 15 in the following, we iterate over each dataset, by providing a short description and the experimental results. it is worth mentioning that our crf reference chunker was trained only once, as described earlier, and not specifically for each dataset. 4.1 dataset: cora the cora dataset is the first gold standard created for automatic reference chunking. 40 it comprises two hundred reference strings and focuses on the computer science area. each entry is segmented into thirteen different fields: author, editor, title, booktitle, journal, volume, publisher, date, pages, location, tech, institution and note. table 1 shows the comparative evaluation results on the cora dataset of parscit, peng et al., 41 han et al., 42 and our approach. we observe that our chunker outperforms the other chunkers on most of the fields, with some of them presenting a significant increase in performance (looking at the f1 score): journal from 91.3 percent to 95.68 percent, booktitle from 93.44 percent to 98.10 percent, publisher from 91.83 percent to 95.33 percent, and especially tech. rep. from 86.7 percent to 95.23 percent. in the case of the fields where our chunker was outperformed, the f1 score is very close to the best of the approaches and includes an increase in one of its two components (i.e., precision or recall). for example, on the organization field, we scored 93.54percent, the best being peng's 94 percent. however, we achieved a gain of almost 10 percent in precision when compared with parscit (100 percent vs. 90.9 percent precision). similarly, on the date field, our f1 was 98.99 percent, opposed to parscit's 99.19 percent, but with a better recall of 98.67 percent. field parscit flux-cim our approach p r f1 p r f1 p r f1 author 98.8 99.0 98.89 93.59 95.58 94.57 99.08 99.08 99.08 title 98.8 98.3 98.54 93.0 93.0 93.0 99.65 99.65 99.65 date 99.8 94.5 97.07 97.75 97.44 97.59 98.55 98.19 98.36 pages 94.7 99.3 96.94 97.0 97.84 97.41 97.28 97.72 97.49 location 96.9 88.4 92.45 96.83 97.6 97.21 95.55 94.5 95.02 journal 97.1 82.9 89.43 95.71 97.81 96.75 94.0 97.91 95.91 booktitle 95.7 99.3 97.46 97.47 95.45 96.45 99.13 99.13 99.13 publisher 98.8 75.9 85.84 100 100 100 98.59 98.59 98.59 table 2. evaluation results on the flux-cim dataset—cs domain field flux-cim our approach p r f1 p r f1 author 98.57 99.04 98.81 99.8 99.36 99.57 title 84.88 85.14 85.01 91.39 91.39 97.39 date 99.85 99.5 99.61 99.89 99.69 99.78 pages 99.1 99.2 99.45 99.94 99.59 99.76 journal 97.23 89.35 93.13 99.42 99.16 99.28 table 3. evaluation results on the flux-cim dataset—hs domain reference information extraction and processing |groza, grimnes, and handschuh 16 4.1 dataset: flux-cim flux-cim 43 is an unsupervised 44 reference extraction and chunking system. in order to evaluate its performance, the authors of flux-cim created two separate datasets:  the flux-cim cs dataset, composed on a collection of heterogeneous references from the computer science field, and  the flux-cim hs dataset is comprised of an organized and controlled collection of references from pubmed. the flux-cim cs dataset contains three hundred reference strings randomly selected from the acm digital library. each string is segmented into ten fields: author, title, conf, journal, volume, number, pub, date, pages and place. the flux-cim hs dataset contains 2000 entries, with each entry segmented into six fields: author, title, journal, volume, date and pages. table 2 presents the comparative test results achieved by parscit, flux-cim, and our approach on the cs dataset. similar to the cora dataset, our chunker outperformed the other chunkers on the majority of the fields, exceptions being the location, journal, and publisher fields. the test results on the hs dataset are presented in table 3. here we can observe a clear performance improvement on all fields, in some cases the difference being significant, e.g., the title field, from 85.01 percent to 97.39 percent, or the journal field, from 93.12 percent to 99.28 percent. this increase is even more relevant considering the size of the dataset, each 1percent representing twenty references. 4.3 dataset: cs-sw while the cora and flux-cim cs datasets do focus on the computer science field, they do not cover the slight differences in reference format that can be found nowadays in the semantic web community. consequently, to show the even broader application of our approach, we have compiled a dataset named cs-sw comprising 576 reference strings randomly selected from publications in the semantic web area, from conferences such as international semantic web conference (iswc), the european semantic web conference (eswc), the world wide web conference (www), or the european conference on knowledge acquisition (and co-located workshops). 45 each reference entry is segmented into twelve fields: author, title, conference, workshop, journal, techrep, organization, publisher, date, pages, website and location. table 4 shows the results of the tests carried out on this dataset. one can easily observe that the chunker performed in a similar manner as on the cora dataset, with emphasis on the author, date, pages and publisher fields. field our approach p r f1 author 98.61 99.27 98.93 title 94.91 93.29 94.09 date 98.89 98.34 98.61 pages 98.94 97.24 98.08 location 93.9 92.77 93.33 organization 85.71 80 00 82.75 journal 94.59 93.33 93.95 information technology and libraries | june 2012 17 conference 96.66 95.08 95.86 workshop 83.33 88.23 85.71 publisher 96.61 97.43 97.01 tech. rep. 100 80 88.88 website 98.14 94.64 96.35 table 4. evaluation results on the cs-sw dataset 5. conclusion in this paper we presented a novel approach for extracting and chunking reference information from scientific publications. the solution, realized using a crf trained chunker, achieved good results in the experimental evaluation, in addition to an increased versatility shown by applying the one-time trained chunker on multiple testing datasets. this enables a straightforward adoption and reuse of our solution for generating semantic metadata in any digital library or publication repository focused on scientific publishing. as next steps, we plan to create a comprehensive dataset covering multiple heterogeneous domains (e.g., social sciences or digital humanities) and evaluate the chunker’s performance on it. then we will focus on developing an accurate reference consolidation and linking technique, to address the second step mentioned in section 1, i.e., aligning the resulting metadata to the existing linked data on the web. we plan to develop a flexible consolidation mechanism by dynamically generating and executing sparql queries from chunked reference fields and filtering the results via two string approximation metrics (a combination of monge-elkan and chapman soundex algorithms). the sparql queries generation will be implemented in an extensible manner, via customizable query modules, to accommodate the heterogeneous nature of the diverse linked data sources. finally, we intend to develop an overlay interface for arbitrary online publication repositories, to enable on-the-fly creation, visualization, and linking of semantic metadata from repositories that currently do not expose their datasets in a semantic / linked manner. acknowledgements the work presented in this paper has been funded by science foundation ireland under grant no. sfi/08/ce/i1380 (lion-2). references and notes 1. tim berners-lee et al., “the semantic web,” scientific american 284 (2001): 35–43. 2. christian bizer et al., “linked data—the story so far,” international journal on semantic web and information systems 5 (2009): 1–22. 3. generating computer-understandable metadata represents an issue, in general, in the publishing domain, and not necessarily only in its scientific area. however, the relevant literature dealing with metadata extraction/generation has focused on scientific publishing, because of its accelerated growing rate, especially with the increasing use of the world wide web as a dissemination mechanism. reference information extraction and processing |groza, grimnes, and handschuh 18 4. knud moeller et al., “recipes for semantic web dog food – the eswc and iswc metadata projects,” proceedings of the 6th international semantic web conference (busan, korea, 2007). 5. wei peng and tao li, “temporal relation co-clustering on directional social network and author-topic evolution,” knowledge and information systems 26 (2011): 467–86. 6. laszlo barabasi et al., “evolution of the social network of scientific collaborations,” physica a: statistical mechanics and its applications 311 (2002): 590–614. 7. xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41 (2005): 1462–80. 8. john d. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data,” proceedings of the 18th international conference on machine learning (san francisco, ca, usa, 2001): 282–89. 9. vladimir vapnik, the nature of statistical learning theory (new york: springer, 1995). 10. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 11. yong kiat ng, “citation parsing using maximum entropy and repairs” (master's thesis, national university of singapore, 2004). 12. fuchun peng and andrew mccallum, “information extraction from research papers using conditional random fields,” information processing & management 42 (2006): 963–79. 13. c. lee giles et al., “citeseer: an automatic citation indexing system,” proceedings of the third amc conference on digital libraries (pittsburgh, pa, 1998): 89–98. 14. kristie seymore et al., “learning hidden markov model structure for information extraction,” proceedings of the aaai workshop on machine learning for information extraction (1999): 37– 42. 15. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 16. hui han et al., “rule-based word clustering for document metadata extraction,” proceedings of the symposium on applied computing (santa fe, new mexico, 2005). 17. eli cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata,” proceedings of the 2007 conference on digital libraries (new york, 2007): 215–24. 18. machine learning methods can be broadly classified into two categories: supervised and unsupervised. supervised methods require training on specific datasets that exhibit the characteristics of the target domain. to achieve high accuracy levels, the training dataset needs to be reasonably large, and more importantly, it has to cover most of the possible information technology and libraries | june 2012 19 exceptions from the intrinsic data patterns. unlike supervised methods, unsupervised methods do not require training, and in principle, use generic rules to encode both the expected patterns and the possible exceptions of the target data. 19. peng and mccallum, “information extraction from research papers using conditional random fields.” 20. hoifung poon and pedro domingos, “joint inference in information extraction,” proceedings of the 22nd national conference on artificial intelligence (vancouver, british columbia, canada, 2007): 913–18. 21. ariel schwartz et al., “multiple alignment of citation sentences with conditional random fields and posterior decoding,” proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (prague, czech republic, 2007): 847–57. 22. simone teufel et al., “automatic classification of citation function,” proceedings of the 2006 conference on empirical methods in natural language processing (sydney, australia, 2006): 103–10. 23. jien-chen wu et al., “computational analysis of move structures in academic abstracts,” coling/acl interactive presentation sessions (sydney, australia, 2006): 41–44. 24. eugenio cesario et al., “boosting text segmentation via progressive classification,” knowledge and information systems 15 (2008): 285–320. 25. dublin core website, http://dublincore.org (accessed may 4, 2011). 26. york sure et al., “the swrc ontology – semantic web for research communities,” proceedings of the 12th portuguese conference on artificial intelligence (covilha, portugal, 2005). 27. yanjun qi et al., “semi-supervised sequence labeling with self-learned features,” proceedings of ieee international conference on data mining (miami, fl, usa, 2009). 28. david sanchez et al., “content annotation for the semantic web: an automatic web-based approach,” knowledge and information systems 27 (2011): 393-418. 29. tshering cigay dorji et al., “extraction, selection and ranking of field association (fa) terms from domain-specific corpora for building a comprehensive fa terms dictionary,” knowledge and information systems 27 (2011): 141–61. 30. please note that the chunker is document-format agnostic and takes as input only raw text. the actual extraction of this raw text from the original document (pdf, doc or some other format) is the user’s responsibility. 31. as a note, we chose this length of fifteen characters empirically, and based on the assumption that in any format the publication content lines usually have more than fifteen characters. reference information extraction and processing |groza, grimnes, and handschuh 20 32. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data.” 33. councill et al., “parscit: an open-source crf reference string parsing package.” 34. the manual tagging was performed by a single person and since the reference chunks have no ambiguity attached, we did not see the need for running any data reliability tests. 35. ron kohavi, “a study of cross-validation and bootstrap for accuracy estimation and model selection,” proceedings of the 14th international joint conference on artificial intelligence (montreal, quebec, 1995): 1137–43. 36. peng and mccallum, “information extraction from research papers using conditional random fields.” 37. councill et al., “parscit: an open-source crf reference string parsing package.” 38. mallet: machine learning for language toolkit, http://mallet.cs.umass.edu (accessed may 4, 2011). 39. william m. shaw et al., “performance standards and evaluations in ir test collections: clusterbased retrieval models,” information processing & management 33 (1997): 1–14. 40. peng and mccallum, “information extraction from research papers using conditional random fields.” 41. councill et al., “parscit: an open-source crf reference string parsing package.” 42. seymore et al., “learning hidden markov model structure for information extraction.” 43. han et al., “rule-based word clustering for document metadata extraction.” 44. cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata.” 45. the cs-sw dataset is available at http://resources.smile.deri.ie/corpora/cs-sw (accessed may 4, 2011). http://resources.smile.deri.ie/corpora/cs-sw ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities article ontology for the user-learner profile personalizes the search analysis of online learning resources the case of thematic digital universities marilou kordahi information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13601 marilou kordahi (marilou_kordahi@yahoo.fr) is assistant professor, faculty of business administration and management, saint-joseph university of beirut, and associate researcher, paragraph research laboratory, paris 8 university. © 2022. abstract we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. thematic digital universities are similar to digital libraries, and focus on creating and indexing open educational resources, as well as improving learning in the information age. the digital user profile relates to the digital representation of a person’s identity and characteristics. in this paper we present the design of an ontology for the digital user-learner profile (ontoulp) and its application program. ontoulp is used to structure a user-learner’s digital profile. the application provides each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. we rely on an exploratory research approach and on methods of ontologies, user modeling, and semantic matching to design the ontoulp and its application program. any user-learner could use the ontoulp and its application program. introduction more online learning environments are supporting the creation and dissemination of quality open educational resources (oer) to facilitate change in the education sector, improve education, ensure longlife learning, reduce cost, and other motives.1 in 2002, the united nations educational, scientific and cultural organization (unesco) recommended the definition of oer as follows: “the open provision of educational resources, enabled by information and communication technologies, for consultation, use and adaptation by a community of users for non-commercial purposes.”2 the william and flora hewlett foundation defined oer as “freely licensed, remixable learning resources—[they] offer a promising solution to the perennial challenge of delivering high levels of student learning at lower cost.”3 in 2012, unesco noted that oer offer education stakeholders an opportunity to access textbooks and other learning contents to enhance their knowledge and professional experiences.4 education stakeholders may choose oer based on their informational needs, behaviors, and preferences.5 we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. we are conducting a case study using the digital university engineering and technology.6 in the following we will explain these topics and the interest in the digital university engineering and technology. in 2003, the french ministry of higher education, research, and innovation initiated the creation of thematic digital universities to facilitate the integration and use of information and mailto:marilou_kordahi@yahoo.fr information technology and libraries june 2022 ontology for the user-learner profile | kordahi 2 communication technologies for education in university teaching practices.7 in total, there are six thematic digital universities which are organized by broad disciplines: health sciences and sports, engineering sciences, environment and sustainable development, humanities, economics and management, as well as technical studies. thematic digital universities are similar to digital libraries, and focus on creating and indexing oer, as well as improving learning in the information age.8 although thematic digital libraries are mostly comprised of oer, they also develop complete training programs with some of these resources (e.g., massive open online courses, or moocs). they are partners with canal-u, the video library for higher education, as well as the french national platform for massive open online courses (fun-mooc). thematic digital universities are mostly created for learners and teachers, as they offer complementary educational resources to bachelor, masters, and doctoral programs.9 to date, learners and teachers have free access to most thematic digital universities and corresponding educational resources. registration is not required; however, without registration neither the learner nor the teacher can analyze her/his search for oer based on informational behaviors, needs, and preferences.10 we will focus on the analysis of oer metadata records in the context of thematic digital universities. each oer in the repository holds a metadata record to precisely describe its specifications to the learner or teacher (e.g., the learning level, language, and topics). specifications are written according to the institute of electrical and electronics engineers (ieee) standards for learning object metadata (lom),11 lomfr, and suplomfr. lom provides an accurate descriptive schema of a learning object suitable for educational resources12 (e.g., the classification and identification of an educational resource). lomfr and suplomfr are currently applications of lom in the french educational community.13 the digital university engineering and technology attracted our attention because of the following characteristics: clear presentation of its objectives, regular information updates, priority for free access to oer and open data, 3,000 published educational resources, extensive documentation of oer indexing, interoperability of oer and metadata records, and an advanced search engine for oer. each metadata record describes precise information on the oer, including the main title, keywords, descriptive text, educational types (or resources), learning level, copyrights, knowledge domains, topics, authors, and publishers. it is processed and structured with xml language which is human-readable and machine-readable. digital user profiles relate to the digital representation of a person’s identity and characteristics.14 digital identity is the sum of digital traces (or “footprints”) relating to an individual or a community found on the web or in digital systems. digital traces correspond to the user’s profile, browsing history, and contribution actions.15 our focus is the learner who wishes to use the thematic digital universities for tailor-made analysis of retrieved information based on her/his needs and preferences. we offer the learner an option to register on these platforms to track behavior over time while searching for oer. analyses are based on criteria the learner has previously chosen to personalize this search. subsequently, we suggest using the term “digital user-learner profile.” we will do our best to respect the general data protection regulations when collecting information on the digital userlearner profile.16 the general data protection regulations are privacy laws drafted and passed by information technology and libraries june 2022 ontology for the user-learner profile | kordahi 3 the european union that prohibit the processing, storage, or sharing of certain types of information about individuals without their knowledge and consent. the research questions are as follows: 1. in the context of thematic digital universities, how can a user-learner personalize the search for open educational resources according to her/his digital profile? 2. in this same context, what kinds of information can a user-learner analyze in a search for open educational resources according to her/his digital profile? the objectives of this article are to present the preliminary results of work in progress on the design of the ontology for the digital user-learner profile (ontoulp) and its application program, the personalized modeling system for the user-learner profile (psul). we rely on the methods of ontology,17 user modeling,18 and semantic matching.19 the method of ontology is used to describe in a formal manner a set of concepts and objects which represent the meaning of an information system in a specific area and the relationships between these concepts and objects.20 the method of user modeling describes the process of designing and changing a user’s conceptual understanding. it is applied to customize and adjust systems to meet the user’s needs and preferences. the method of semantic matching is used to identify and relate a meaning concept (or class) to its homologous concept in tree-like schemas and to consider the concept’s position in these schemas (e.g., mapping a class in an ontology to homologous concepts in metadata records). this relationship can be a one-to-one concept or one-to-many concepts. the ontoulp is a first approach, and it will be used to structure a user-learner’s digital profile in the context of thematic digital universities. we design this ontology for three main reasons: to structure collected and generated information21 (e.g., structuring a user-learner’s learning preferences will enable the identification of learning behaviors and activities), to analyze collected and generated information22 (e.g., analyzing generated information by a user-learner may predict a search for oer), as well as to facilitate relationships between a user-learner and thematic digital universities23 (e.g., analyzing user-learner informational behaviors may improve oer creation and dissemination). the psul will be designed as an application program for the ontoulp. it will be used to provide each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. psul will include a secure database and web pages, namely those for registering and editing the user-learner profile and its dashboard.24 ontoulp and its application program will offer each registered user-learner an opportunity to analyze the search for oer according to informational behaviors and needs. ontoulp and psul could be implemented in the structure of information systems for educational and research institutions, documentation and information centers, and many others. we will finetune our analysis by relying on a case example—the thematic digital universities. this article comprises six sections. first, we will explain the exploratory research carried out in the context of thematic digital universities. second, we will present the main published works related to the subject of the article. third, we will explain the approach followed to design and write the ontoulp. fourth, we will discuss the creation of the psul application program. fifth, we will demonstrate the integration of the designed ontology and its application program into a information technology and libraries june 2022 ontology for the user-learner profile | kordahi 4 mirror site to perform a technical test. finally, we will discuss the completed work before concluding the article. exploratory research approach this exploratory research is based on an analysis of the literature, a semistructured questionnaire, and an in-depth documentary research. we check the consistency of collected information and identify the need to personalize the search for oer as well as make tailor-made analysis of information. methods used during the first 18 months of the covid-19 pandemic (november 2020–may 2021), we conducted qualitative research to deepen our comprehension of the practices of thematic digital universities. we collected and interpreted primary and secondary information. primary information: we contacted the digital university association and their six thematic digital universities.25 because of their extensive expertise and robust knowledge in leading or managing thematic digital universities, directors and general secretaries were chosen to selfadminister an electronic semistructured questionnaire. we contacted seven individuals and received six responses. in this questionnaire, we asked about the following topics: the recent knowledge of thematic digital universities, conditions of access to oer, metadata records indexing as well as user-learner’s expectations. an example of the questionnaire is included in the appendix. secondary information: we analyzed a report by the french general inspectorate of the national education and research administration. we have also studied recently-published scientific articles by anne boyer (2011), deborah arnold (2018),26 and sihem zghidi and mokhtar ben henda (2020). the results and findings will be explained in the following paragraphs. results of information collection we have compared responses to the questionnaire and contents of published documents and articles. for the digital university in health sciences and sports, “resources are mostly accessible to learners from member universities, through an identification system based on the university email address.”27 only a few resources are open to the public. otherwise, according to comments gathered from the other four digital universities and digital university association, “thematic digital universities are part of global movements providing access to oer by promoting open access to knowledge.”28 they are an opportunity for learners to discover new disciplines and explore new areas.29 in fact, “the process for indexing metadata records meets standards for education, such as lom, lomfr and suplomfr.”30 at present, there is no feedback on the use of thematic digital universities platforms. in other words, “thematic digital universities have no information about learners who view oer, because there is no login and password. this is done on purpose to make them as open as possible.”31 these platforms are considered as a means of selftraining with quality assurance, as the documents have been produced and validated by higher education teachers. “thematic digital universities provide a certain flexibility allowing learners to work when and where they want.”32 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 5 findings five thematic digital universities and the digital university association responded to the semistructured questionnaire. two thematic digital universities can track user-learners’ behaviors. these digital universities are related to the disciplines of health and sport in addition to technical studies. to date, four thematic digital universities cannot track user-learners’ interactions based on informational behaviors and preferences. ontoulp and its application program could be implemented in four thematic digital universities, which are related to the disciplines of engineering sciences, environment and sustainable development, humanities, economics, and management. literature review to our best knowledge, published research works addressing this research subject are limited in the context of thematic digital universities.33 we analyze the most recent ontologies and user modeling systems that are close to our research objectives. the main works we use are those of bloom et al. (1984),34 smythe et al. (2001),35 green and panzer (2009),36 and kordahi (2020),37 in addition to kelly and belkin (2002). the work methods and field studies these researchers have developed are useful to design the structure of the ontoulp and the model of its application program. in the following paragraphs, we will explain these works and the relationships with this research article. selection of recently published works in 2020 and 2021, kordahi designed an ontology and a personalized dashboard for user learners.38 the objectives of these works were to track individual searches for oer and compare them with a user-learner’s field of work. to design her ontology, kordahi relied on standardized ontologies and validated taxonomies which are used in online learning environments, namely the ims learner information profile (ims lip)39 and bloom’s taxonomy. the personalized dashboard was linked to the user-learner ontology. the designed dashboard was tested technically with its ontology in a digital library environment to examine its performance. kordahi used the methods of ontologies and semantic matching. learner model we are mostly interested in the learner model40 as it “is a model of the knowledge, difficulties and misconceptions of the individual [learner].” 41 as students learn the educational resources they find, the learner model is updated to display their current progress. the model can continue to tailor students’ interactions as they learn. there are several learner models, such as the ims lip.42 we examine the ims lip, which is based on a standardized data model describing a learner’s characteristics. it is mainly used to manage a student’s learning history to discover her/his learning opportunities. ims lip is made from 11 categories that gather learning information: “the identification, goals, qualifications and licenses, activity, interest, competency, accessibility, transcript, affiliation, security, and relationships.”43 this model has been successfully used by many renowned researchers (e.g., paquette 201044) to design a learner model and then adapt it to appropriate contexts. ims lip’s reliability, accuracy, and flexibility match well with the ontoulp motives. we will use it to begin designing the structure of the ontoulp and adapt it to the thematic digital universities context. we will also consider the ieee lom, lomfr, and suplomfr classification fields. this measure will be used to improve semantic matching between the ontoulp and oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 6 taxonomy of educational objectives we examine the user-learner’s educational objectives to meet informational needs and expectations.45 in each oer metadata record, educational objectives are defined based on bloom’s taxonomy (e.g., “understand the context and rules of scientific publication” 46). bloom et al. have developed a taxonomy for educational objectives to classify statements teachers expected students to learn as a result of lessons and instructions. the researchers described a method for allowing students to achieve educational goals while carrying out exercises utilizing the resources of the environment. bloom et al. relied on in-depth qualitative studies to design and validate this taxonomy. bloom’s taxonomy contains the following six major categories related to the cognitive domain: knowledge, comprehension, application, analysis, synthesis, and evaluation. this taxonomy was revised in 2001 by lorin anderson et al.47 bloom’s taxonomy is still in use internationally as in the works of kordahi. integrating bloom’s taxonomy into the ontoulp will enhance the structure of a user-learner’s educational objectives. these educational objectives will be organized in six categories allowing the user-learner to refine her/his informational goals. therefore, we will create a mutual link between the user-learner’s educational objectives and oer educational objectives. knowledge domains knowledge organization systems48 are seen as a valuable component for searching for oer.49 our research includes analyses of oer metadata records to establish relationships between their knowledge topics and the user-learner’s topics of interest. in the thematic digital universities’ metadata records, a precise classification is reported respecting both knowledge topics and dewey decimal classification (e.g., geographic information systems (526.028 5)). 50 the dewey decimal classification and relative index 22nd edition,51 published in 2003 by the online computer library center,52 is being used worldwide in digital libraries and by the thematic digital universities.53 in their works published in 2009, green and panzer have developed an ontology to structure knowledge domains.54 this ontology recognizes two classes, which are dewey classes and knowledge topics. we selected the dewey decimal classification for the ontoulp because the thematic digital universities are already using it. we will rely on green and panzer’s ontology to structure the knowledge domains in the ontoulp (e.g., the use of dewey classes and knowledge topics). we will establish relationships between the knowledge domains and user-learner model, allowing the user-learner to choose the most appropriate learning topics. user modeling system the “user modeling system for personalized interaction and tailored retrieval” is useful for analyzing each user-learner’s informational needs and preferences.55 kelly and belkin’s system helps the user to track informational needs over time. it contains three classes of models and a set of interactions. the “general behavioral model” tracks information seeking and user behavior to determine informational needs. the “personal behavioral model” characterizes each user’s information search according to specific preferences and behaviors. the “topical models” are associated with concepts related to each user’s informational behaviors. this model is developed by renowned researchers specialized in information retrieval and corresponds to the objectives of the research article. we will use the structure of kelly and belkin’s model (2002) to design the psul application program, in the context of thematic digital information technology and libraries june 2022 ontology for the user-learner profile | kordahi 7 universities. relationships between both the psul and ontoulp ontology will be established to carry out personalized analyses of oer search. ontoulp ontology ontoulp’s design is based on the works discussed in the previous section. it consists of two stages. we start by writing it. we then describe the ontology and emphasize the relationships between different entities. writing the ontology we write ontoulp with protégé editor and use the hermit inference engine to check the consistency of classes and their relationships with objects. the ontology’s first approach is saved in owl format, which is compliant with the semantic web technologies. ontoulp description the ontology is comprised of five subsystems. these are: user-learner, user-learner model, educational objectives, learning design, and knowledge domains. each subsystem is composed of classes that inherit the attributes of the subsystem on which they depend. for brevity, the figures show the hierarchical representation of these subsystems. the user-learner subsystem contains all recorded private information on the digital user-learner profile. the classes personal information, identification sessions, and traces provide information about the user-learner’s behavior and search history for oer, e.g., the search duration for oer (see fig. 1). the user-learner model subsystem is responsible for structuring collected information related to learning behaviors and needs, namely the classes identification, interest, learning level (or qualifications and licenses), personal preferences (or accessibility), activities, learning objectives (or goals), affiliation, and network of contacts (or relationships). in the context of thematic digital universities, the resulting subsystem is composed of eight classes instead of eleven. the userlearner model subsystem conveys the structured information to the user-learner subsystem. figure 1 shows the structure of both subsystems, the user-learner and user-learner model. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 8 figure 1. hierarchical representation of both subsystems, the user-learner and user-learner model. the educational objectives subsystem includes cognitive objectives involved in the process of acquiring knowledge. we design their structure by adapting bloom’s taxonomy. the cognitive objectives class includes six interrelated subclasses: remember (or knowledge), understand (or comprehension), apply (or application), analyze (or analysis), synthetize (or synthesis), and evaluate (or evaluation). the cognitive objectives class is enhanced with the ieee lom, lomfr, and suplomfr classification fields enabling the user-learner to choose objectives which best describe their needs and preferences, e.g., the class apply has subclasses design, choose (see fig. 2). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 9 figure 2. hierarchical representation of educational objectives and learning design subsystems. the learning design subsystem is an adaptation of the ims learning design model, in the context of thematic digital universities.56 the learning design subsystem has two main classes: the userlearner’s environment and learning activities. the environment class has six thematic digital universities as subclasses. in a general manner, information about the environment class comes from thematic digital universities platforms (e.g., the viewed metadata records). the learning activities class has resources as a subclass. the resources subclass is also enriched with the ieee lom, lomfr, and suplomfr classification fields to complete its structure and meet the userlearner’s needs and expectations. further, we have connected the learning activities with cognitive objectives classes to ensure continuity between them (e.g., the subclass experimentation is associated with subclass analyze). figure 2 illustrates the main structure of both subsystems, the learning objectives and learning design. the knowledge domains subsystem contains the main class dewey decimal classification and class contacts. this main class has two subclasses: dewey classes, with the corresponding divisions as subclasses, and knowledge topics, with the corresponding subtopics as subclasses (e.g., science topic corresponds to dewey class 500, manufacturing subtopic corresponds to division 670). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 10 figure 3. hierarchical representation of the subsystem knowledge domains. the subclass knowledge topics is related to the subclass user-learner’s learning topics to improve informational behavior analyses. the class contacts is linked to the subclass user-learner’s network of contacts to analyze the strength or weakness of networks between the user-learner and oer publishers/authors (see fig. 1). the subsystem knowledge domains can deal with questions which belong to different levels in the ontoulp. for example, which learning topics is the user-learner looking for? which network of contacts is the user-learner interested in? what are the activities related to the user-learner learning topics? what keywords searched relate to the user-learner’s learning topics?57 in figure 3, we show some of the subsystem’s elements. personalized modeling system for the user-learner profile the psul is based on the works discussed in the previous sections. it is written with php, javascript, and xml, computing languages for the web. this new modeling system comprises three classes of models: the general behavioral, personal behavioral, and topical (see fig. 4). the general behavioral model has two roles. it registers a user-learner’s digital profile in order to determine informational needs and preferences for oer. it also collects informational behaviors of a user-learner while viewing oer metadata records for tailor-made analyses. the general information technology and libraries june 2022 ontology for the user-learner profile | kordahi 11 behavioral model includes the ontology ontoulp as well as user-learner registration and editing pages. the registration page contains relevant information about a user-learner, an option to accept or reject data collection, and a list of choices for behavioral analyses. once registered, the user-learner can modify her/his profile from the editing page. both pages are mapped to the ontoulp to populate criteria fields. the user-learner profile information is stored in a secure database (as described in the introduction). the personal behavioral model is used to analyze information according to the registered digital user-learner profile and informational behaviors. it contains a set of queries to collect and tailor information for each user-learner. the sources of information are the general behavioral model and oer metadata records. this model is designed based on analyses of the general behavioral model. when a user-learner begins searching for oer, the general behavioral model provides the personal behavioral model with all profile information as well as the history of oer search. this information is transmitted to make an adjustment to the personalized user-learner profile. the user-learner profile changes as the personal behavioral model receives more information from the general behavioral model. informational interactions connect the personal behavioral model to topical models. the topical models bring together all analyses of oer search for each user-learner.58 they are inferred from the personal behavioral model. informational interactions connect the topical models to the general behavioral model. for now, we have designed four topical models and present their outcome in the user-learner dashboard page. this page may be used as a practical dashboard providing feedback to each user-learner, who can use these analyses to adjust or make changes in the profile or the oer search. topical model 1 is used to synthesize each user-learner’s search history and to suggest a profile adjustment. the suggested adjustment is based on analyses of user-learner behavioral trends.59 topical model 2 allows each user-learner to examine the list of knowledge topics which have caught her/his attention. it contains two separate lists describing viewed oer metadata records and matching them to the chosen topics of interest. topical model 3 shows comparative analyses between the user-learner’s preference criteria and viewed metadata records. the user-learner can interact with this model by comparing the chosen topics of interest to the viewed knowledge topics. the user-learner can also compare the chosen learning activities to the viewed teaching pedagogies. the teaching pedagogies as well as knowledge topics are extracted from oer metadata records (see fig. 5a). topical model 4 highlights each user-learner’s interest based on the keyword search volume. the user-learner can interact with this model by studying the relationships between searched keywords and chosen topics of interest (see fig. 5a and fig. 5b). figure 4 shows the diagram of psul as explained in the paragraph. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 12 figure 4. the psul diagram based on the kelly and belkin’s system (2002).60 ontoulp and its modeling system in the context of a thematic digital university for now, ontoulp and its application program are implemented in the digital university engineering and technology private platform which is hosted on a private server. we conducted a technical test to mainly assess ontoulp’s precision and performance. the digital university’s team has sent us a complete archive of their oer metadata records. these oer metadata records are saved on the private server with the digital university engineering and technology platform. once a user-learner is registered to this platform, she/he can carry out actions through the psul. for example, these actions are a search by keyword, personalization of profile, tailored-made analysis of oer search, and visualization of analyses in the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 13 figure 5a. screenshot of a section of the dashboard. the bar chart shows comparative analyses between a user-learner’s topic of interest and knowledge topics. the knowledge topics are extracted from the viewed oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 14 figure 5b. screenshot of a section of the dashboard. the pie chart highlights a user-learner’s interest based on a keyword search volume. the bar chart shows comparative analyses between a user-learner learning activities and viewed teaching pedagogies. the keywords are extracted from the search. the teaching pedagogies are extracted from oer metadata records. to avoid making the article longer, in figures 5a and 5b, we show brief results of a technical test. in this example, the user-learner’s identity is fictitious, or the user-learner’s persona is a construct.61 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 15 in other words, the user-learner’s identity is not real, it is fabricated to conduct and complete the technical test. when registering, this user-learner has selected the technology topic (dewey class 600) in addition to the management and public relations subtopic (dewey division 650). this userlearner has also selected all topical models. during a viewing session, this user-learner chose to search for oer while using a few keywords. the keywords were chosen according to the userlearner’s profile and in order to continue the technical test. discussion and conclusion the ontology for the digital user-learner profile is a first approach based on the semantic web. it is designed for the personalization of interactions and retrieval of tailored information. we have combined standardized and validated resources, such as the ims lip, bloom’s taxonomy, and knowledge domains ontology, to allow the user-learner’s search analyses. we have discussed the design of a new application program prototype allowing a user-learner to analyze the search for oer according to her/his digital profile. psul provides automated real-time feedback based on the user-learner’s search history and information she/he has inserted about herself/himself. we have then demonstrated the integration of the ontoulp and psul into a mirror site to perform a technical test. the ontology’s main characteristics are flexibility and adaptability. while designing ontoulp, we have reused or restructured resources to allow its use in other thematic digital universities and online learning environments, including digital libraries. another advantage of ontoulp is the application of several information processing techniques. for example, a registered user-learner can self-assess her/his search for oer by keywords. she/he can also analyze the relevance of the search for oer through the psul. we have successfully overcome three essential limitations. the first limitation concerns the literature on the subject (see literature review section). while contributing to the field of research in information technology and digital libraries, this work has also drawn on disciplines as diverse as those of education as well as cognitive, social, and human sciences. the terminological definitions of disciplines, concepts, and even methods vary over decades or centuries, and among groups of researchers. we have made every effort to define the different terms correctly and to cite the corresponding researchers. the second difficulty relates to the design of ontoulp. published works dealing with this topic are rare. we used an exploratory research approach and the published works of renowned international researchers to fine-tune our study (see the exploratory research approach and literature review sections). we then determined the classes and objects as well as relationships between them. the third constraint concerns the design of the psul by following the thematic digital universities policies and respecting the general data protection regulations. according to the regulations, we have opted for an optional registration to thematic digital universities and to collecting information on the digital user-learner profile. thus, the user-learner will always have the possibility of registering to these platforms to make a tailor-made information analysis according to the digital profile. as we conclude our work, we have a plan to focus our research and initiatives in the following areas. firstly, we will further deepen our study of ontoulp classes to further increase their precision. we will also examine the search personalization of oer based on uses and practices of algorithms in the ontoulp.62 for example, by relying on newer version of the ontology we will identify the topics of interest, which may interest a specific user-learner. we will implement this information technology and libraries june 2022 ontology for the user-learner profile | kordahi 16 newer version in some thematic digital universities to perform technical tests. secondly, we will conduct qualitative and quantitative studies to analyze participants’ behavior while using ontoulp and its application program, in the context of thematic digital universities. for example, we will examine how many participants would choose to use the ontoulp and psul as well as how many wouldn’t (e.g., the usefulness of ontologies to participants). we will analyze the behavior of individuals with digital personae and make connections between their searches for oer.63 we will study their profiles, behaviors, and interests to ultimately suggest oer (e.g., the use of recommendation systems). we will also analyze how participants’ behavior and feedback may affect future findings. participants would be previously selected to contribute to these studies. thirdly, we will study the effects of ontoulp and psul practices on the thematic digital universities. this study will concern an analysis of the thematic digital universities’ search engines and users-learners’ needs. for example, exploratory research will allow us to better understand user-learners’ informational needs and expectations when using the oer search engines. we will analyze the design of oer search engines considering these informational needs and expectations. we will then utilize and integrate these findings to suggest alternatives to the thematic digital universities to further improve these search engines. acknowledgments we thank the digital university association and thematic digital universities for their elaborate and enlightening explanations concerning the platforms. we thank the reviewers and claude baltz, emeritus professor in information and communication sciences at the paris 8 university, for carefully reviewing this article and for enriching it with their expert observations. thanks to mohammad hajj hussein, communication and it engineer, for his help programming the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 17 appendix: semistructured questionnaire example email subject: digital university engineering and technology dear sir, madam, i am affiliated to the paragraph research laboratory at the paris 8 university (laboratoire de recherche paragraphe, université paris 8). i am writing to you to gather further information concerning the digital university engineering and technology. the objective of the semistructured questionnaire is to deepen my comprehension of the practices of digital university engineering and technology in order to write a research article and contribute to its improvement. i would be grateful if you could answer the following questions: • what are your responsibilities at the digital university engineering and technology? • do the thematic digital universities as well as digital university engineering and technology provide “open” educational resources? • are the educational resources accessible only to students enrolled in the training programs of partner universities? • how is the access to educational resources made? • do the educational resources follow document processing for their indexing? • is the document processing specific to the thematic digital universities? • what are the expectations of “users” searching for educational resources? thank you in anticipation sincerely yours, marilou kordahi information technology and libraries june 2022 ontology for the user-learner profile | kordahi 18 endnotes 1 “cape town open education declaration: unlocking the promise of open educational resources,” 2007, http://www.capetowndeclaration.org/read-the-declaration. 2 unesco, “forum on the impact of open courseware for higher education in developing countries,” (2002): 24, http://unesdoc.unesco.org/images/0012/001285/128515e.pdf. 3 william and flora hewlett foundation, “open education,” accessed april 5, 2022, https://hewlett.org/strategy/open-education. 4 unesco, “2012 paris oer declaration,” 2012, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.html. 5 camille thomas, kimberly vardeman, and jingjing wu, “user experience testing in the open textbook adaptation workflow,” information technology and libraries journal 40, no. 1 (2021): 1–18, https://doi.org/10.6017/ital.v40i1.12039. 6 digital university engineering and technology, “open educational resources for engineering and technology,” accessed april 5, 2022, https://unit.eu. 7 jean delpech de saint guilhem, sonia dubourg-lavroff, and jean-yves de longueau, “thematic digital universities,” general inspectorate of the national education and research administration, 2016, https://www.enseignementsuprecherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/lesuniversites-numeriques-thematiques.html. 8 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930; anne boyer, “thematic digital universities: report,” sciences et technologies de l'information et de la communication pour l'éducation et la formation 18, no. 1 (2011): 39–52. 9 sihem zghidi and mokhtar ben henda, “open educational resources and open archives in the open access movement: an educational engineering and scientific research crossed analysis,” distances and mediations of knowledge 31 (2020), https://doi.org/10.4000/dms.5347. 10 diane kelly and nicholas j. belkin, “a user modeling system for personalized interaction and tailored retrieval in interactive ir,” proceedings of the american society for information science and technology 39, no. 1 (2002): 316–25, https://doi.org/10.1002/meet.1450390135. 11 ieee learning technology standards committee, “learning object metadata, final draft standard, 1484.12.1-2002,” http://ltsc.ieee.org/wg12. 12 gregory m. shreve, and marcia lei zeng, “integrating resource metadata and domain markup in an nsdl collection,” in international conference on dublin core and metadata applications (2003): 223–29. http://www.capetowndeclaration.org/read-the-declaration http://unesdoc.unesco.org/images/0012/001285/128515e.pdf https://hewlett.org/strategy/open-education http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html https://doi.org/10.6017/ital.v40i1.12039 https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.4000/dms.5347 https://doi.org/10.1002/meet.1450390135 http://ltsc.ieee.org/wg12 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 19 13 french standardization association, “description standard for the field of education in france – – part 1: description of learning resources (nodefr-1), nf z76-041,” 2019. 14 arthur allison, james currall, michael moss, and susan stuart, “digital identity matters,” journal of the american society for information science and technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 15 katalin feher, “digital identity and the online self: footprint strategies – an exploratory and comparative research study.” journal of information science 47, no. 2 (2021): 192–205. https://doi.org/10.1177/0165551519879702. 16 robyn caplan and danah boyd, “who controls the public sphere in an era of algorithms,” mediation, automation, power (2016), https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf. 17 thomas r. gruber, “a translation approach to portable ontology specifications,” knowledge acquisition 5, no. 2 (1993): 199–220, https://doi.org/10.1006/knac.1993.1008. 18 gerhard fischer, “user modeling in human–computer interaction,” user modeling and useradapted interaction 11, no. 1 (2001): 65–86, https://doi.org/10.1023/a:1011145532042. 19 yannia kalfoglou and marco schorlemmer, “ontology mapping: the state of the art,” the knowledge engineering review 18, no. 1 (2003): 1–31, https://doi.org/10.1017/s0269888903000651. 20 tom gruber, “collective knowledge systems: where the social web meets the semantic web,” web semantics: science, services and agents on the world wide web 6 no. 1 (2008): 4–13, https://doi.org/10.1016/j.websem.2007.11.011. 21 peter ingwersen, “search procedures in the library – analysed from the cognitive point of view,” journal of documentation 38, no. 3 (1982): 165–97, https://doi.org/10.1108/eb026727. 22 tefko saracevic, amanda spink, and mei-mei wu, “users and intermediaries in information retrieval: what are they talking about?” in user modeling: proceedings of the sixth international conference (vienna: springer, 1997): 43–54. 23 núria ferran, enric mor, and julià minguillón, “towards personalization in digital libraries through ontologies,” library management 26, no. 4/5 (2005): 206–17. https://doi.org/10.1108/01435120510596062. 24 katrien verbert, erik duval, joris klerkx, sten govaerts, and josé luis santos, “learning analytics dashboard applications,” american behavioral scientist 57, no. 10 (2013): 1500– 1509, https://doi.org/10.1177/0002764213479363. 25 digital university association, “open educational resources for all,” accessed april 5, 2022, https://univ-numerique.fr. https://doi.org/10.1002/asi.20112 https://doi.org/10.1177/0165551519879702 https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf https://doi.org/10.1006/knac.1993.1008 https://doi.org/10.1023/a:1011145532042 https://doi.org/10.1017/s0269888903000651 https://doi.org/10.1016/j.websem.2007.11.011 https://doi.org/10.1108/eb026727 https://doi.org/10.1108/01435120510596062 https://doi.org/10.1177/0002764213479363 https://univ-numerique.fr/ information technology and libraries june 2022 ontology for the user-learner profile | kordahi 20 26 deborah arnold, “the french thematic digital universities – a 360° perspective on open and digital learning,” in european distance and e-learning network conference proceedings, no. 1 (2018): 370–78. 27 director of the digital university in health and sport messaged author, may 3, 2021. 28 director of the virtual university of environment and sustainable development messaged author, january 6, 2021. 29 director of the digital university in economics and management messaged author, december 08, 2020. 30 general secretary of the open university of the humanities messaged author, may 1, 2021. 31 member of digital university association messaged author, december 18, 2020. 32 director of the digital university engineering and technology messaged author, december 11, 2020. 33 laecio araujo costa, leandro manuel pereira sanches, ricardo josé rocha amorim, laís do nascimento salvador, and marlo vieira dos santos souza, “monitoring academic performance based on learning analytics and ontology: a systematic review,” informatics in education 19, no. 3 (2020): 361–97. 34 benjamin s. bloom, david r. krathwohl, and bertram b. masia, taxonomy of educational objectives: the classification of educational goals (new york: longman, 1984). 35 colin smythe, frank tansey, and robby robson, “ims learner information package. best practice & implementation guide,” ims global learning consortium, 2001. 36 rebecca green and michael panzer, “the ontological character of classes in the dewey decimal classification,” the library, (2009), https://www.ergonverlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf 37 marilou kordahi, «le changement de l’apprentissage, l’ontologie du profil de l’utilisateurapprenant, » management des technologies organisationnelles, 10 (2020): 73–88. 38 marilou kordahi, “information literacy: ontology structures user-learner profile in online learning environment,” in seventh european conference on information literacy, (2021): 130, http://ecil2021.ilconf.org/wpcontent/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149. 39 “ims learner information package accessibility for lip best practice and implementation guide,” ims global learning consortium, last revised june 18, 2003, https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html. 40 judy kay, “learner know thyself: student models to give learner control and responsibility,” in proceedings of international conference on computers in education (1997): 17–24. https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html information technology and libraries june 2022 ontology for the user-learner profile | kordahi 21 41 susan bull, “supporting learning with open learner models,” in proceedings of 4th hellenic conference with international participation information and communication technologies in education (2004): 47–61. 42 peter dolog and wolfgang nejdl, “challenges and benefits of the semantic web for user modelling,” in proceedings of the workshop on adaptive hypermedia and adaptive web-based systems (ah2003) at 12th international world wide web conference (2003). 43 ims global learning consortium, “ims learner information package accessibility for lip best practice and implementation guide,” para. 2. 44 gilbert paquette, “ontology-based educational modelling-making ims-ld visual,” technology, instruction, cognition & learning 7, no. 3–4 (2010): 263–93. 45 john seely brown and richard p. adler, “open education, the long tail, and learning 2.0,” educause review 43, no. 1 (2008): 16–20. 46 open university of the humanities, “how to write and publish a scientific article,” accessed on april 5, 2022, https://uoh.fr/front/noticefr/?uuid=6a063dd7-3a02-482a-9857934501f7c82d. 47 lorin w. anderson, david r. krathwohl, peter w. airiasian, kathleen a. cruikshank, richard e. mayer, paul r. pintrich, james raths, and merlin c. wittrock. a taxonomy for learning, teaching and assessing: a revision of bloom’s taxonomy of educational objectives (new york: longman publishing group, 2001). 48 birger hjørland, “theories are knowledge organizing systems (kos).” knowledge organization 42, no. 2 (2017): 113–28, https://doi.org/10.5771/0943-7444-2015-2-113. 49 walter moreira and daniel martínez-ávila, “concept relationships in knowledge organization systems: elements for analysis and common research among fields,” cataloging & classification quarterly 56, no. 1 (2018): 19–39, https://doi.org/10.1080/01639374.2017.1357157. 50 wayne a. wiegand, “the ‘amherst method’: the origins of the dewey decimal classification scheme.” libraries & culture 33, no. 2 (1998): 175–94. 51 melvil dewey, dewey decimal classification and relative index, ed. joan s. mitchell, julianne beall, giles martin, and winton e. matthews, 22nd ed., (dublin, ohio: oclc, 2003). 52 joan s. mitchell, “ddc 22: dewey in the world, the world in dewey,” advances in knowledge organization 9 (2004): 139–45. 53 hamid saeed and abdus sattar chaudhry, “using dewey decimal classification scheme (ddc) for building taxonomies for knowledge organisation,” journal of documentation 58, no. 5 (2002): 575–83. 54 rebecca green and michael panzer, “the interplay of big data, worldcat, and dewey,” advances in classification research online 24, no. 1 (2013): 51–58. https://doi.org/10.5771/0943-7444-2015-2-113 https://doi.org/10.1080/01639374.2017.1357157 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 22 55 kelly and belkin, “a user modeling system,” 319. 56 rob koper and colin tattersall, eds., learning design: a handbook on modelling and delivering networked education and training (heidelberg: springer science and business media, 2005). 57 david beer, “envisioning the power of data analytics,” information, communication & society 21, no. 3 (2018): 465–79, https://doi.org/10.1080/1369118x.2017.1289232. 58 charles lang, george siemens, alyssa wise, and dragan gasevic, eds., handbook of learning analytics (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17. 59 joris klerkx, katrien verbert, and erik duval, “learning analytics dashboards,” in handbook of learning analytics, ed. charles lang, george siemens, alyssa wise, and dragan gasevic, (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17, 143–50. 60 kelly and belkin, “a user modeling system,” 319. 61 roger clarke, “the digital persona and its application to data surveillance,” the information society 10, no. 2 (1994): 77–92, https://doi.org/10.1080/01972243.1994.9960160. 62 ahu sieg, bamshad mobasher, and robin burke, “web search personalization with ontological user profiles,” in proceedings of the sixteenth acm conference on conference on information and knowledge management (2007): 525–34, https://doi.org/10.1145/1321440.1321515. 63 roger clarke, “persona missing, feared drowned: the digital persona concept, two decades later,” information technology & people 27, no. 2 (2014): 182–207, https://doi.org/10.1108/itp-04-2013-0073. https://doi.org/10.1080/1369118x.2017.1289232 https://doi.org/10.18608/hla17 https://doi.org/10.18608/hla17 https://doi.org/10.1080/01972243.1994.9960160 https://doi.org/10.1145/1321440.1321515 https://doi.org/10.1108/itp-04-2013-0073 abstract introduction exploratory research approach methods used results of information collection findings literature review selection of recently published works learner model taxonomy of educational objectives knowledge domains user modeling system ontoulp ontology writing the ontology ontoulp description personalized modeling system for the user-learner profile ontoulp and its modeling system in the context of a thematic digital university discussion and conclusion acknowledgments appendix: semistructured questionnaire example endnotes navigating uncharted waters: utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries article navigating uncharted waters utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries annie wu, taylor davis-van atta, bethany scott, santi thompson, anne washington, jerrell jones, andrew weidner, a. laura ramirez, and marian smith information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14719 annie wu (awu@uh.edu) is head of metadata and digitization services and the ambassador kenneth franzheim ii and mrs. jorgina franzheim endowed professor, university of houston libraries. taylor davis-van atta (tgdavis-vanatta@uh.edu) is director of the digital research commons, university of houston libraries. bethany scott (bscott3@uh.edu) is head of preservation and reformatting, university of houston libraries. santi thompson (sathompson3@uh.edu) is associate dean for research and student engagement and the eva digital research endowed library professor, university of houston libraries. anne washington (washinga@oclc.org) is semantic applications product analyst, oclc. jerrell jones (jjones46@uh.edu) is digitization lab manager, university of houston libraries. andrew weidner (andrew.weidner@bc.edu) is head of digital production services, boston college libraries. a. laura ramirez (alramirez@uh.edu) is senior library specialist, university of houston libraries. marian smith (mrsmith8@uh.edu) is digital photo tech, university of houston libraries. © 2022. abstract in 2019, the university of houston libraries formed a theses and dissertations digitization task force charged with digitizing and making more widely accessible the university’s collection of over 19,800 legacy theses and dissertations. supported by funding from the john p. mcgovern foundation, this initiative has proven complex and multifaceted, and one that has engaged the task force in a broad range of activities, from purchasing digitization equipment and software to designing a phased, multiyear plan to execute its charge. this plan is structured around digitization preparation (phase one), development of procedures and workflows (phase two), and promotion and communication to the project’s targeted audiences (phase three). the plan contains step-by-step actions to conduct an environmental scan, inventory the theses and dissertations collections, purchase equipment, craft policies, establish procedures and workflows, and develop digital preservation and communication strategies, allowing the task force to achieve effective planning, workflow automation, progress tracking, and procedures documentation. the innovative and creative approaches undertaken by the theses and dissertations digitization task force demonstrated collective intelligence resulting in scaled access and dissemination of the university’s research and scholarship that helps to enhance the university’s impact and reputation. introduction to answer the call of implementing university of houston (uh) libraries strategic plan to position the libraries as a campus leader in research and transform library space to reflect evolving modes of learning and scholarship, the uh libraries launched a cross-departmental task force in 2019 charged with digitizing the university’s extensive print theses and dissertations collection. providing online access to newly digitized theses and dissertations boosts the reach and impact of our institution’s research and scholarship while expanding available space for computing, mailto:awu@uh.edu mailto:tgdavis-vanatta@uh.edu mailto:bscott3@uh.edu mailto:sathompson3@uh.edu mailto:washinga@oclc.org mailto:jjones46@uh.edu mailto:andrew.weidner@bc.edu mailto:alramirez@uh.edu mailto:mrsmith8@uh.edu information technology and libraries september 2022 navigating uncharted waters | 2 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith technology, and faculty and student learning and research activities. a study by bennett and flanagan revealed the positive impact and benefits of online dissemination of theses and dissertations, including enhanced discoverability by google’s strong indexing capabilities, significant increase in the usage of the works, and an overall enhancement of the reputation of an institution.1 encouraged by the positive outcomes and supported by funding from the john p. mcgovern foundation to initiate this project, the theses and dissertations digitization (tdd) task force developed a phased project plan and utilized creative, automated processes and methods to execute it. this article articulates the tdd project planning and the innovative work undertaken by the task force to achieve efficiency in making our print theses and dissertations readily available to new readerships around the world. literature review over the past several decades, research libraries have been building programs around digitization and open access repository infrastructures, largely aimed at expanding their digital collections and engaging communities with newly available research materials. for some, part of their programming has included projects that digitize their institution’s legacy print collections of theses and dissertations. the review below explores literature on the mass digitization process , including institutional case studies, guidance documents, legal and policy papers, and local documentation developed as libraries have planned and implemented these projects. any library tackling a retrospective thesis and dissertation project needs a framework for determining the copyright status of these works en masse. perhaps it is no coincidence, then, that copyright concerns are the most heavily documented aspect of the process. clement and levine provide the definitive work to date on copyright and the publication status of theses and dissertations written in the united states before 1978. their study asserts that “p re-1978 american dissertations were considered published for copyright purposes by virtue of their deposit in a university library or their dissemination by a microfilm distributor.”2 they go on to write that, “for copyright purposes, these were acts of publication with the same legal effect as dissemination through presses, publishers, and societies.”3 they suggest that libraries should investigate the copyright status for theses and dissertations authored between 1909 and 1978 (typically found on the title page and verso); if there is no copyright notice, then the thesis or dissertation is likely in the public domain and eligible for digitization and public release without permission. moreover, even those works that have a printed copyright notice might have fallen out of copyright if they were not renewed after 28 years for the same length of time.4 broad guidance and best practice for copyright status and other matters of process around theses and dissertations is provided in guidance documents for lifecycle management of etds, which acknowledges that legal services may be required for some retrospective thesis and dissertation digitization projects, especially “before scanning without the permission of former students .”5 the authors assert that information professionals should investigate any “appropriate access options” with institutional legal expertise before engaging in a retrospective digitization project and articulate the two most commonly encountered copyright scenarios: “[either] former student authors may not allow the reproduction and open dissemination of their work, or unauthorized copyrighted material was used in the original theses and dissertations.”6 strategies that might be employed to determine copyright status include “consulting with legal counsel at one’s institution to see where it stands on this issue; negotiating with commercial entities that make such content information technology and libraries september 2022 navigating uncharted waters | 3 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith available at a price so that institutions can have some control over it for the purpose of broader access; and working with groups such as alumni associations, colleges, departments, and graduate schools to establish contact with thesis and dissertation authors for securing their permission to digitize, and render available online, their past scholarship.”7 on the question of public access to newly digitized works, the guidance documents detail the implications of the “transition from print to electronic,” which “has led to increased scrutiny over who will be allowed to access the electronic versions and how widely they will be disseminated.”8 when there is any legal doubt, there are many reasons for libraries to exercise caution and restrict access to electronic theses and dissertations; that said, “research available on the web immediately upon submission of the final, approved thesis can prove advantageous to the newlydegreed student, the institution, and other researchers.”9 again, consulting legal officers and the original authors, if possible, remains the consensus approach to establishing a strategy for access to digitized theses and dissertations. the guidance documents also touch on the thorny issue of digitizing theses and dissertations that contain third-party content. they summarize the history and routine application of the fair use doctrine in both the creation and dissemination of scholarly works but provide little firm guidance on the matter.10 indeed, after reviewing the entire body of literature on retrospective thesis and dissertation projects, this remains a practical challenge that any library undertaking a mass digitization project must consider and the associated risks must be accounted for. in recent years, several case studies have documented institutions’ efforts to digitize and make more widely available legacy theses and dissertations. of the institutions that the tdd task force reviewed for the environmental scan, none of their case studies attempts an exhaustive documentation of end-to-end workflows and processes developed to execute the task; most focus on particularly difficult questions inherent to the process. martyniak provides a rationale for the university of florida’s (uf) retrospective scanning project and details their process for contacting authors before works were scanned.11 the workflow outlines several points of contact with authors to obtain signed distribution agreements, as well as uf’s approach to automate this process as much as possible. notably, the distribution agreement form and correspondence templates are provided as appendices to the article.12 as part of this retrospective digitization project, uf also released a scanning policy that articulates their approach to determining the copyright status of works and their resultant practice. 13 this policy document is an excellent example of an institution’s implementation of clement and levine’s research described above. likewise, mundle describes the methods used by simon fraser university (sfu) to establish its approach to the issues of copyright status and access, ultimately resulting in a public thesis access policy and procedures for contacting authors whenever possible to offer them the ability to opt their work out of the project.14 unlike the uf, sfu began scanning before any explicit permission had been obtained from authors. sfu also shares their use of scripts to automate the ingest of metadata from original marc records into their dspace repository.15 piorun and palmer, meanwhile, focus on an analysis of the time and cost associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the university of information technology and libraries september 2022 navigating uncharted waters | 4 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith massachusetts chan medical school.16 piorun and palmer detail the library’s process for obtaining cost comparisons from external vendors as well as estimated costs, including labor, associated with undertaking the task in house.17 issues of workflow, policy development, and permissions are also addressed with an emphasis on developing accurate and streamlined methods of processing works; however, piorun and palmer conclude, “regardless of the amount of planning and thought that goes into a project, there is always the possibility that each record or file will need to be reworked.”18 shreeves and teper discuss theses and dissertations’ complicated status as grey literature and the university of illinois urbana-champaign (uiuc) library’s digitization project, which they describe as “less of a collection management or preservation issue and more as an effort to tackle broader scholarly communication and outreach issues.”19 after consulting with university legal counsel, digitized works were ingested to the uiuc institutional repository as a restricted (campus -only access) collection. as authors provide consent, access to their work is broadened to the public. worley demonstrates that, according to an analysis of circulation numbers, works that are accessible electronically are used dramatically more than print copies, serving as rationale for undertaking digitization of student works.20 they provide significant detail around virginia polytechnic institute and state university’s process to establish file specifications for its digitization process, and image quality/resolution and file format selection are discussed in some detail, with helpful visual examples.21 these case studies are particularly valuable in that they provide evidence and cautionary tales around how local contexts have made a difference in copyright and workflow issues. this case study contributes to the existing body of literature by attempting to provide an exhau stive, end-toend description of the retrospective digitization process—from copyright evaluation, to physical handling, to digitizing with an eye to access controls and digital preservation concerns. furthermore, our approach to digitization at scale incorporates automation at several points throughout the workflow, representing a production improvement to the decade-old case studies we reviewed. project planning and execution digitizing a large corpus of print theses and dissertations is a complex process touching areas of equipment, copyright policy, workflows for different sections of the process, progress tracking, preservation, and communication. to handle such a multifaceted project, the tdd task force designed a plan that divided the project activities into three phases (see table 1). phase one is dedicated to tasks of preparation such as the environmental scan, copyright permission investigation, digitization equipment purchasing, and print theses and dissertation inventory. phase two includes activities such as digitization and metadata workflow development, documentation, project tracking, ingestion, and preservation of digitized files. phase three is mainly for promotion and communication to our researchers on the availability of our digitized theses and dissertations collection. task force members volunteered to serve in subteams for identified specific tasks in each project phase. information technology and libraries september 2022 navigating uncharted waters | 5 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 1. phased planning for the tdd project theses and dissertation digitization task force planning phases task force activities subteams (*subteam lead) phase one: preparation environmental scan jerrell, anne, crystal, santi*, bethany physical theses/dissertations inventory/retention bethany copyright permissions and policies bethany, annie, taylor* purchase equipment jerrell*, crystal phase two: workflow development td digitization workflow jerrell, crystal* td metadata workflow anne*, taylor, annie td ingest and publishing workflow andrew, taylor td progress tracking annie, andrew preservation/storage strategy bethany*, santi phase three: communication/promotion promote dtd to colleagues and researchers taylor*, santi communicate progress to staff and users annie*, santi* develop training materials for stakeholders anne*, crystal* phase one: preparation a subgroup of the tdd task force conducted an environmental scan of similar theses and dissertations digitization approaches previously used by other institutions. the lead for the subgroup created a google sheet that all group members used to document information found in published literature, public documents, and institutional websites. the lead assigned group members to review information from institutions with publicly available data, including: the university of florida, the university of north texas, the university of illinois urbana champaign, brigham young university, william and mary university, texas a&m university, the university of arizona, the california institute of technology, the massachusetts institute of technology, iowa state university, xavier university, texas tech university, and the university of british columbia. group members noted relevant information pertaining to a variety of topics focused on theses and dissertations digitization. one of the most prominent was the institution’s response to copyright permissions. the group tried to determine if the institution required author permission before releasing a digitized thesis or dissertation (the “opt in” option), or incorporated policies and procedures that prioritized taking down digitized theses and dissertations once requested by the author (the “opt out” option). they observed software and hardware specifications used by other institutions—critical data that would inform the technology needed to complete a project of this scale. the group documented the key components of the digitization and metadata workflows, including roles and responsibilities, sequencing of actions, and the implications that policies and procedures had on the process. this data helped the group understand what gaps, common problems, and emerging best practices existed. finally, the group reviewed physical retention and preservation strategies articulated by institutions to ensure it understood the long-term stewardship hurdles and requirements for analog and digital material. information technology and libraries september 2022 navigating uncharted waters | 6 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith based on the assessment of the 19,800 uh theses and dissertations identified for inclusion in the project, the digitization subgroup members determined that several scanners would be required for agility in digitization production. the tdd digitization workflow was designed so that this project could run effectively, in parallel, with existing digitization projects regardless of the need for some theses to be scanned on existing equipment. automatic document feed (adf) scanners were a strategic choice for the rapid scanning of disbound items. two canon dr-g2110 adf scanners were purchased for the project. these scanners were chosen for their scanning speed, scanning quality, ease of use, onboard image preprocessing, and reasonable price point. the canon dr-g2110 can handle a large page stack, approximately 500 pages. theses and dissertations can be scanned on the longer or shorter dimension, which allows faster scanning times. among many innovative features, this duplex scanner simultaneously digitizes both sides of a document, rotates the pages based on text orientation, and auto-crops through preprocessing during the scanning process. this canon adf solution makes more image postprocessing automation possible since the resulting scans match our output expectations with minimal user input. other scanning options were needed for a smaller subset of theses and dissertations that could not be disbound. the digitization team leveraged an existing zeutschel os 12002 planetary scanner for items that could not be disbound. an existing plustek opticbook (po) a300 plus was used for items with foldouts containing graphs, maps, and illustrations that measure beyond 11 inches on the longest dimension. additionally, a plustek opticbook 3800l was purchased to accommodate fragile us letter–sized pages that are not suitable for adf scanning. thin or heavily waxed papers typically do not stand up well to the fast-moving rubber rollers and other internal scanning mechanisms. while the po 3800l provides a much longer scanning time than the po a300, both scanners can scan into the page gutter of bound materials, a useful feature for items with insufficient margins. figure 1. image processing workflow testing on a thesis in limb processing. the green check marks on the left indicate that a page has been processed correctly. information technology and libraries september 2022 navigating uncharted waters | 7 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith the canon adf scanner operates through two pieces of software working concurrently, canon captureontouch v4 pro and kofax vrs (an additional product supplied by the task force’s scanner vendor). some image processing settings are applied in kofax vrs, which communicates with captureontouch v4 pro. both pieces of software were bundled with purchases of the two canon adf scanners. limb processing by i2s was also purchased for the project. limb processing is a powerful mass image processing product that operates through user-built processing workflows that can be applied to multiple folders, creating standardized output suitable for automation. the limb software can transform an imperfect scan into a fully processed, clean derivative with minimal user input, which is especially useful for transforming legacy image data. abbyy finereader server 14 is used to provide quality optical character recognition (ocr) data and features efficient tools for automation, allowing for large ocr processing jobs to be queued and run recursively with minimal user intervention (see fig. 1). with these powerful tools, uh libraries has been able to leverage our existing scanners, new scanners, and advanced software to plan for the timely capture of nearly three million pages of content. the number of theses and dissertations required the implementation of a semiautomatic disbinding system. the spartan 185a paper cutter from the challenge machinery company was purchased to ensure the replication of many clean binding removals. options from several manufacturers were considered for these needs but the spartan 185a offered the cutting power needed to cut millions of pages over the life of the project (see fig. 2). the cutter features several safeguards that protect the operator, such as the lowering of a protective acrylic guard and the requirement of two hands, away from the blade, to lower the blade automatically. uh libraries chose a local cutter blade replacement company that services the equipment quarterly. in addition to the cutter, supplies for binding removal and physical volume management were needed, such as: • x-acto knife and/or utility blades • recycling bins • table brooms and dustpans • disbinding tables • cutting mats • standing mats • letter and legal-size folders • folder holders • surface cleaning materials • carts/book trucks information technology and libraries september 2022 navigating uncharted waters | 8 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 2. (l) thesis scanning test on the zeutschel os 12002; (r) challenge spartan 185a cutter with red numbers indicating blade lowering and cutting safety button order. the physical retention of volumes was considered in the context of the overall preservation of the theses and dissertations collection, including the digital preservation approach to the tdd project. the uh libraries holds two copies of a student’s thesis and dissertation. after consulting with stakeholders throughout the library—such as the university archivist, the dean of libraries and associate deans, and access services’ shelving team for shelf space/storage in different areas of the library building—the task force decided to retain one bound copy of each thesis or dissertation. additional copies will be weeded from the general collection, and the best copy for digitization will be disbound for feeder scanning using the equipment described above. when only one copy of a thesis or dissertation exists in the collection, it will be scanned using a scanner that will not destroy or damage the binding. the retained theses and dissertations collection will be housed in uh libraries special collections in the secure and climate controlled closed stacks. once the tdd task force settled on this retention strategy, the digital projects coordinator, a member of the task force who represents special collections, conducted a full shelf -read of the theses and dissertations already housed in special collections. using a master tracking spreadsheet that was generated from catalog reports for project tracking and pulling, a small team of student workers reviewed over 20,000 volumes to identify missing titles, titles with multiple copies that can be weeded from special collections, and copies with label and/or cataloging errors. missing titles were transferred from the general stacks to special collections, and the items were reshelved in chronological order. a more extensive shelving shift still needs to be completed to move volumes to accommodate additions and finalize the shelf location for all items in this collection, which will no longer be growing or because all theses and dissertations at uh are submitted electronically as of 2014. as part of the shifting project, the items also need to be checked in and/or have their location codes changed in the catalog to reflect their new permanent home in special collections. information technology and libraries september 2022 navigating uncharted waters | 9 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith phase two: workflow development the theses and dissertations digitization workflow starts with pulling physical volumes from shelves. the task force generated a report of all uh theses and dissertations and sorted them by call number so that student workers can pull these volumes from the general stacks in call number order. after the pulled volumes’ records are withdrawn from the catalog system, they are shelved by call number order in the “ready for digitization” section of the tdd shelf in the library basement, close to the digitization lab. volumes are pulled from a section of the library stacks dedicated to the tdd project and loaded onto book carts for transfer to the physical volume processing room. using a custom-built processing table, covers are removed with utility knives and discarded. the text is placed in a folder with a pre-printed label indicating the oclc number and call number of the volume. the spine of each volume is removed with a spartan guillotine. the completely disbound volumes, housed in labeled folders, are then moved on book carts to the scanning room. prior to scanning, physical volumes are grouped in batches of approximately 50 and a text file is created that lists each oclc number in a batch, one per line. a simple executable file reads the text file and creates a batch directory. the batch directory is labeled with the current date in yyyymmdd format and contains a folder for each scanned volume. the scanned volume folders are labeled by oclc number and contain a metadata.txt file that records the volume’s descriptive metadata from the uh catalog system in yaml format: a data carrier that is easily readable for both humans and machines. scanning is performed with one of the two canon dr-g2110 high-speed feed scanners controlled by kofax vrs and captureontouch v4 pro software. before a volume is placed in the scanner, it is checked to ensure that the binding has been completely removed, that there are no pages that have been glued in after binding, and that there are no onion skin pages, irregular page sizes, inserts, or foldouts. if necessary, additional scans for delicate onion skin pages, inserts, or foldouts are performed on a flatbed scanner. page images are output as 300 dpi grayscale or color tiffs, and first pass quality control for completeness, page legibility, and rotation, and cropping is performed in captureontouch. after page images have been captured, a batch is loaded into limb for final processing. scanned volumes are again checked for completeness, legibility, and orientation. text pages are processed as 300 dpi bitonal tiffs. pages with grayscale or color images are processed as such. when batch processing is complete, the documents’ processed signature pages, which include names and signatures of the author, advisor, and committee members, are separated out so they are not included in the final version published online. this step protects the privacy of individuals by not sharing their signature openly over the internet. the tdd project uses abbyy finereader server 14 to generate full text pdfs and a plain text file for each scanned volume. the data in each scanned volume directory undergoes transformations both before and after the ocr processing. the transformations are accomplished with the tdd workflow utility, a ruby command line application. before running a batch through ocr, the archive digitized batch function moves the high-resolution master tiffs to an archive directory and formats the batch directory for the input that abbyy expects. after ocr processing, the archive ocr batch function moves the derivative tiffs used as ocr input to the archive directory information technology and libraries september 2022 navigating uncharted waters | 10 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith as well. in a final process before sending the batch to the metadata unit, the process ocr batch function adds descriptive metadata to the embedded exif metadata of each pdf document with exiftool for improved accessibility. the tdd task force sought to align print materials’ metadata standards with the existing metadata standards applied to electronic theses and dissertations in the university’s institutional repository, largely based on the dictionary of texas digital library descriptive metadata guidelines for electronic theses and dissertations, v.2.22 early on in the project, the metadata subteam reviewed thesis and dissertation records in the institutional repository (ir) as well as marc catalog records in uh libraries’ library services platform, with special emphasis on the metadata elements used, to identify alignments and gaps. after analysis, the team established the crosswalk from marc to the qualified dublin core profile in the ir (see table 2). in july 2019, uh libraries migrated to the alma library services platform. prior to this migration, the task force exported tdd marc records from uh libraries’ former library services platform, sierra, and crosswalked into dublin core metadata fields using the freely available software marcedit. data was further normalized using openrefine. at this early stage, openrefine proved to be a valuable tool for batch editing and formatting metadata and identifying legacy terms or missing data. once the crosswalked data was cleaned up and put into place, standard values for all records were added (see table 3). table 2. metadata crosswalk from marc to qualified dublin core metadata field marc field qualified dublin core element oclc number 001, 035 $a dc.identifier.other call number 099 [n/a, admin use only] author name 100 $a dc.creator title 245 $a $b dc.title thesis year 264 $c dc.date.issued degree information 500, 502 $a thesis.degree.name subject 6xx fields dc.subject department 710 $b thesis.degree.department during the ongoing processing of digitized materials and as part of the quality control, each volume’s metadata is evaluated against its corresponding metadata record and edited when necessary. in an effort to enrich the metadata available to users and increase visibility of the volumes, information not typically provided in the marc records, such as thesis committee chairs, other committee members, and abstracts, are added to the records using dublin core contributor (dc.contributor.committeemember) and abstract (dc.description.abstract). information technology and libraries september 2022 navigating uncharted waters | 11 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 3. standard values added to all records qualified dublin core element value dc.format.mimetype application/pdf dc.type.genre “thesis” or “dissertation,” as applicable thesis.degree.grantor university of houston dc.type.dcmi text dc.format.digitalorigin reformatted digital in the interest of closely observing copyright best practices, members of the tdd task force, including the digital projects coordinator and the director of the digital research commons, created copyright review guides and applicable rights statements. under these guidelines, theses and dissertations are considered under copyright if a copyright notice appears on volumes created in 1977 and earlier, if the item was created between 1978 and february 28, 1989, and if it was registered with the us copyright office within five years of its creation, or if it was created on march 1, 1989 or later. inserts and other research material provided in the volumes are similarly considered for copyright evaluation during the copyright review process. once a volume has been evaluated for copyright status, an out-of-copyright or in-copyright statement is assigned. in alignment with the uh libraries’ mission to provide valuable research and educational materials, digitized volumes and metadata records are then ingested into the institutional repository.23 in this stage of the process, out-of-copyright volumes are made available as open access materials. due to inherent limitations, in-copyright volumes are access restricted and available solely to the university community. when content is ready for ingest, volumes are moved to the ingest folder and placed in staging directories based on rights status: open access or in copyright. the ingest process is the same for both types of content, but in-copyright content requires additional post-ingest processing, so ingest batch folders are labeled according to rights status for clarity. the tdd workflow utility’s prepare ingest package function is used to create ingest packages in an input format expected by the saf creator, a utility for preparing dspace batch imports in the simple archive format .24 pdf files are copied and renamed in the format lastname_year_oclcnumber.pdf, a csv file is created with descriptive metadata for the batch, and the original files and metadata are moved to an archive directory. the saf creator is then used to create an saf ingest package that is imported into dspace. limiting access to copyrighted content was a necessary component of the project that took some time to solve. the team investigated creating a separate collection for the in-copyright content with access limited to users logged in with uh credentials. the downside to this approach was that the content within the restricted collection was not discoverable to users who were not information technology and libraries september 2022 navigating uncharted waters | 12 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith logged into the ir. in the end, the tdd task force worked with the texas digital library, a consortial nonprofit organization that hosts uh libraries’ dspace repository, to enable restricted access using bitstream authentication with shibboleth. this allows the task force to ingest all tdd project content into a single collection and apply uh authentication to copyrighted pdf documents. in this manner, descriptive metadata for all documents is discoverable, but access to the document content is only available to members of the uh community. applying authentication to bitstreams in the dspace administrative interface is a tedious process involving numerous clicks and dropdown menu selections. selenium ide, a browser plug-in designed for automated web development testing, is used to automate that process in the firefox web browser. after an in-copyright batch has been ingested, the tdd workflow utility’s prepare selenium script function is used to create an automation script for selenium. when loaded in the firefox selenium add-on, the script automatically applies the bitstream authentication steps in the browser for each volume in the batch. the tdd workflow comprises detailed tasks carried out at different units in the library in a sequential routine as an assembly line. tdd activities flow from pulling volumes from shelves to disbinding, scanning, image quality control and ocr, metadata creation and copyright evaluation, and digitized files ingestion into the dspace system. as the tdd task force worked collaboratively to develop and confirm workflows for this complicated process, they documented each section of the workflow in the one-stop tdd workflow google document for easy access and transparency of the overall process.25 the tdd working group members notify each other at completion of tasks at each section. to better track each thesis and dissertation as it moved through the digitization, metadata, and copyright verification workflows, the task force developed an excel spreadsheet tracking system.26 this tracking system lists uh libraries’ theses and dissertation titles, their oclc numbers, dates, and call numbers. it records the tdd volumes pulled from shelves, digitization completed, digitization batch, borrower notes, metadata completed, and other notes. this tracking system provides a channel for the team members to inform each other of completed tasks at each unit and to communicate issues in the working process (see fig. 3). information technology and libraries september 2022 navigating uncharted waters | 13 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 3. a screenshot of a portion of the tdd tracking system. phase three: promotion, communication, and next steps it is important to have strategies for tdd promotion and communication to raise awareness of the online availability of the university’s legacy theses and dissertations. the tdd task force brainstormed elements such as audience, channels, and timeline for tdd communication. theses and dissertation authors and campus users are the two main groups the task force plans to target in its promotion and communication plan. to attract audience attention, the tdd task force will design an online flyer/postcard for dissemination. they are currently collaborating with the uh libraries director of communication, the uh alumni office, the uh graduate office, and the uh division of research to distribute messages to targeted audiences. the task force will communicate tdd digitization progress as they reach important milestones, including the completion of pre-1978 volumes, then at the increments of 10,000 and 15,000 volumes, and once all volumes have been digitized and deposited to the repository. with the disbanding, digitization, and metadata workflows firmly in place, the tdd task force commenced the process of generating digitized versions of uh’s theses and dissertations in 2020. while this process will continue over the next several years, the task force will also fo cus on refining policies and workflows around its copyright and digital preservation activities. the tdd task force has developed a draft copyright policy development document, which outlines copyright determination decisions and access controls for content deemed in copyright. the task force is currently consulting with uh general counsel to ensure its recommended copyright approaches are in concert with university best practices. information technology and libraries september 2022 navigating uncharted waters | 14 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith at the same time, the task force is developing digital preservation procedures to ensure the longterm access, storage, and preservation to digitized theses and dissertations. the group has made some foundational decisions to date. since one physical copy of each title will be retained, allowing for future higher-resolution rescanning if needed, the task force determined that the preservation master file for each digitized thesis or dissertation will be one pdf. this will allow the uh libraries to greatly reduce the ongoing storage costs associated with digitally preserving the tdd collection. throughout 2023, the task force will be exploring ways to sync tdd content to its current digital preservation workflow process, including submitting content to uh libraries’ archivematica instance for preservation curation services such as file fixity checks and normalization, and transferring preserved tdd content to cloud storage for distributed digital preservation. prior to ingesting any content into the institutional repository, the team reached out to the uh’s electronic and information resources accessibility (eira) coordinator for feedback on the accessibility of the pdf documents produced by abbyy. the eira coordinator recommended encoding our pdfs as pdf/a-1a, a standard designed for preservation and accessibility, and introduced the team to the accessibility tools available in adobe acrobat. the adobe acrobat accessibility checker has been useful for identifying and addressing accessibility issues with the pdfs that we are producing. uh libraries web accessibility standards strive to comply with the world wide web consortium’s (w3) web content accessibility guidelines (wcag). combined with the feedback from the uh’s eira coordinator, the current output was reviewed against these accessibility checklists, and areas needing improvement were identified. after several adjustments, the newest output for the project passes a majority of adobe acrobat’s accessibility checker accessibility parameters, with further investigation planned to address weak points moving forward. conclusion the tdd project at uh libraries provides an in-depth view of the planning and workflow processes needed to launch a retrospective thesis and dissertations digitization effort in an academic library setting. collaborating across uh libraries departments, the tdd task force designed a phased approach to identify technology and resources needed to undertake the project, to develop policies, procedures, and workflows to guide the work to its completion, and to communicate about the scope, purpose, and progress of the project to internal and external stakeholders. throughout the planning and development phases, the task force leveraged automation, bibliographic data reuse, and project management tracking to achieve workflow objectives efficiently and responsibly. with the project well underway, the task force will continue refining its processes and working across uh libraries and campus units to ensure it complies fully with copyright and digital preservation best practices. through these ongoing efforts, the tdd task force is ensuring that the original research and scholarship contained in thousands of theses and dissertations are more accessible than ever before—broadening the reach and impact of uh graduates well into the future. funding this project was funded by the john p. mcgovern foundation. information technology and libraries september 2022 navigating uncharted waters | 15 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith acknowledgments the authors dedicate this work to the memory of their colleague and tdd task force member crystal cooper. endnotes 1 linda bennett and dimity flanagan, “measuring the impact of digitized theses: a case study from the london school of economics,” insights: the uksg journal 29, no. 2 (2016): 111–19, https://doi.org/10.1629/uksg.300. 2 gail clement and melissa levine, “copyright and publication status of pre-1978 dissertations: a content analysis approach,” portal: libraries and the academy 11, no. 3 (2011): 825, https://doi.org/10.1353/pla.2011.0031. 3 clement and levine, “copyright and publication status,” 825. 4 clement and levine, “copyright and publication status,” 826. 5 xiaocan (lucy) wang, “guidelines for implementing etd programs—roles and responsibilities,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner (2014): sect.1, p. 14, https://educopia.org/guidance-documents-forlifecycle-management-of-etds. 6 wang, “guidelines,” 1-17. 7 patricia hswe, “briefing on copyright and fair use issues in etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 3, p. 12, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 8 geneva henry, “guide to access levels and embargoes of etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 2, p. 1, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 9 henry, “guide to access levels,” 2-1. 10 hswe, “briefing on copyright,” 3-9–3-13. 11 cathleen l. martyniak, “scanning our scholarship: the university of florida retrospective dissertation scanning project,” microform and imaging review 37, no. 3 (2008): 122–24, https://doi.org/10.1515/mfir.2008.013. 12 martyniak, “scanning our scholarship,” 127–29. 13 “retrospective dissertation scanning policy,” (2011), university of florida, accessed january 1, 2022, https://ufdc.ufl.edu/aa00007596/00001. https://doi.org/10.1629/uksg.300 https://doi.org/10.1353/pla.2011.0031 https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://doi.org/10.1515/mfir.2008.013 https://ufdc.ufl.edu/aa00007596/00001 information technology and libraries september 2022 navigating uncharted waters | 16 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith 14 todd mundle, “digital retrospective conversion of theses and dissertations: an in house project,” in proceedings of eighth symposium on electronic theses and dissertation (sydney, australia, 2005): 3–4. 15 mundle, “digital retrospective conversion,” 3. 16 mary piorun and lisa a. palmer, “digitizing dissertations for an institutional repository: a process and cost analysis,” journal of the medical library association: jmla 96, no. 3 (2008): 223–29, https://doi.org/10.3163/1536-5050.96.3.008. 17 piorun and palmer, “digitizing dissertations,” 224. 18 piorun and palmer, “digitizing dissertations,” 227. 19 sarah l. shreeves and thomas h. teper, “looking backwards: asserting control over historic dissertations,” college and research library news 73, no. 9 (2012): 532–33, https://doi.org/10.5860/crln.73.9.8830. 20 gary m. worley, “dissertations unbound: a case study for revitalizing access,” in proceedings of the 10th international symposium on electronic theses and dissertations (uppsala, sweden, 2007). 21 worley, “dissertations unbound,” 3–6. 22 dictionary of texas digital library descriptive metadata for electronic theses and dissertations, version 2.0, (2015), http://hdl.handle.net/2249.1/68437. 23 to access cougar roar, see https://guides.lib.uh.edu/roar. 24 saf creator is a tool developed by james creel at texas a&m university. for more, see https://github.com/jcreel/safcreator. 25 see the tdd google document: https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g /edit?usp=sharing. 26 see the complete tdd tracking system: https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd 1oan0/edit?usp=sharing. https://doi.org/10.3163/1536-5050.96.3.008 https://doi.org/10.5860/crln.73.9.8830 http://hdl.handle.net/2249.1/68437 https://guides.lib.uh.edu/roar https://github.com/jcreel/safcreator https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing abstract introduction literature review project planning and execution phase one: preparation phase two: workflow development phase three: promotion, communication, and next steps conclusion funding acknowledgments endnotes tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility article tech tools in pandemic-transformed information literacy instruction pushing for digital accessibility amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15383 amanda rybin koob (amanda.rybinkoob@colorado.edu) is assistant professor, literature and humanities librarian, university of colorado. kathia salomé ibacache oliva (kathia.ibacache@colorado.edu) is assistant professor, romance languages librarian, university of colorado. michael williamson (michael.d.williamson@colorado.edu) is assistant director, assessment and usability, digital accessibility office, university of colorado. marisha lamont-manfre (marisha.manfre@colorado.edu) is accessibility and usability assessment coordinator, digital accessibility office, university of colorado. addie hugen (addison.hugen@colorado.edu) is senior accessibility tester, digital accessibility office, university of colorado. amelia dickerson (amelia.dickerson@colorado.edu) is accessibility professional, digital accessibility office, university of colorado. © 2022. abstract inspired by pandemic-transformed instruction, this paper examines the digital accessibility of five tech tools used in information literacy sessions, specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, jamboard, and poll everywhere. first, we provide an overview of the americans with disabilities act (ada) and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in information literacy instruction for student engagement. second, we examine accessibility testing assessments of the five tech tools selected for this paper. our data show that the tools had severe, significant, and minor levels of digital accessibility problems, and while there were some shared issues, most problems were unique to the individual tools. we explore the implications of tech tools’ unique environments as well as the importance of best practices and shared vocabularies. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in information literacy instruction, thereby enhancing an equitable learning environment for all students. introduction the last fifteen years have seen the rise of collaborative and interactive web platforms and whiteboards, game-based learning technologies, audience polls, and other tools that contribute to student engagement in higher education classrooms. these educational tech tools have supported one-time library information literacy (il) sessions by enabling student participation in real time. still, knowing that tech tools may enhance engagement is not enough; we should also be asking whether these tech tools are accessible for all students and, if not, what can be done to make them more accessible. this paper examines the digital accessibility of five tech tools specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, mailto:amanda.rybinkoob@colorado.edu mailto:kathia.ibacache@colorado.edu mailto:michael.d.williamson@colorado.edu mailto:marisha.manfre@colorado.edu mailto:addison.hugen@colorado.edu mailto:amelia.dickerson@colorado.edu information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 2 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson jamboard, and poll everywhere. these tech tools were identified in a 2021 paper inquiring which tech tools librarians used in emergency remote il instruction during the covid-19 pandemic along with their perceptions of the weaknesses and strengths of these tech tools.1 although there are guidelines aiding librarians in assessing ada accessibility around library spaces, there are no disability-related recommendations for specific tech tools used in il instruction or studies examining tech tools’ digital accessibility features. 2 there is also a lack of documentation regarding librarians’ outreach to ada-related academic offices and tech companies regarding tech tools. we argue that collaboration between libraries and ada-related offices at the campus level increases awareness of digital accessibility issues and requirements and could ultimately advance digital accessibility in educational tech tools used in il instruction. we place our paper within the context of other pandemic-responsive digital pedagogy research. we acknowledge that technology needs for student engagement are evolving in new face-to-face, hybrid, and remote instruction environments; thus, we hope to impact the way tech tools are assessed for digital accessibility and to promote the use of accessibility-tested tech tools in library instruction. first, we provide an overview of ada and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in instruction for student engagement. secondly, we examine accessibility testing reports for the five tech tools selected for this paper. then, we discuss two trends found in the reports: shared issues between the tools and the implications of unique environments. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in il instruction, thereby enhancing an equitable learning environment for all students. overview ada accessibility the americans with disabilities act (ada) was made law in 1990, signaling an initiative to protect people with disabilities from discrimination in employment opportunities, when purchasing goods, and when participating in state and local government services. 3 the idea behind the ada law was to provide equal opportunity.4 however, as health sciences librarian ariel pomputius notes, ada law protects people from discrimination, but it does not guarantee a right to accessibility beyond the legal requirements granted by this act.5 as higher education advances through the covid-19 pandemic, digital accessibility has become more essential than ever in il instruction as it takes place in hybrid, remote, and in -person environments. to ensure the digital accessibility of tech tools for all students, we should first understand its meaning. what is digital accessibility? the covid-19 pandemic brought digital accessibility to the forefront as universities navigated complex remote and hybrid learning environments. fernando h. f. botelho, a scholar with expertise in technology and disability, explains digital accessibility as the interconnection of “hardware design, software development, content production, and standards definition.”6 for botelho, accessibility is “an ongoing and dynamic process” rather than an immobilized state, where standards work together as a part of a ubiquitous process.7 as information studies information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 3 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson professor jonathan lazar notes, “digital accessibility means providing an equal user experience for people with disabilities, and it never happens by accident.” 8 georgetown law also defines digital accessibility from a perspective that may resonate with instructors who seek technologies that are accessible to all students. they define digital accessibility as “the inclusive practice of removing barriers that prevent interaction with, or access to websites, digital tools, and technologies.”9 however, it is lazar who moves the topic forward when referring to digital accessibility in research libraries, arguing that although accessibility laws protect people with disabilities, digital accessibility also benefits the whole population.10 lazar made this assertion after capturing the challenges and lessons learned related to digital accessibility during covid-19.11 the most salient lesson is that research libraries should create an infrastructure that supports digital accessibility, especially now that the covid-19 pandemic has driven universities to provide instruction in multiple formats.12 we argue that this infrastructure should also include digital accessibility evaluation of tech tools used in the classroom. assistive technology for blind users congress defined assistive technology in the disabilities act of 1988 as “any item, piece of equipment, or product system, whether acquired commercially off the shelf, modified, or customized, that is used to increase, maintain, or improve functional capabilities of individuals with disabilities.”13 furthermore, special education professors kathleen puckett and kim w. fisher state, “technology becomes assistive when it supports an individual … to accomplish tasks that would otherwise be difficult or impossible.”14 as scholars of occupational therapy claire kearney-volpe and amy hurst note, screen readers assist people with no or low vision by presenting web information on “a non-visual interface” via braille or speech synthesis.15 screen readers’ purpose is important because all people should have the opportunity to access the same information and services in the digital environment without facing undue barriers or burdens. the digital accessibility office’s (dao) assessment and usability team at university of colorado boulder (cu boulder) primarily tests tools for accessibility by utilizing screen reader assistive technology for both computers and mobile devices. assessment and usability staff rely on screen readers for testing because this assistive technology uses and responds to the underlying code of each webpage, application, and environment. this in-depth output makes screen readers good tools for overall accessibility testing, even though they are generally for people with no vision. however, we found no studies on tech tools and classroom engagement that consider assistive technology such as screen readers. classroom engagement with tech tools academic librarians emily chan and lorrie knight state in a 2010 study that library instruction risks being anachronistic if it does not include an engaging technology-based activity.16 with this in mind, there is ample literature documenting the impact and benefits of tech tools in the classroom. for example, authors highlight tech tools’ anonymous environment, categorized as free of judgment, noting that it is student-centered and enhances student participation.17 moreover, anonymity provides a means for students to answer honestly, fostering classroom discussion that includes introverted students.18 on the other hand, some authors argue that anonymous participation does not enhance critical thinking. ann rosnida md deni and zainor izat zainal, referring to padlet as an educational tool, information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 4 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson argue that one challenge of using tech tools to advance student engagement is that they do not, on their own, enhance criticality or discussion because students may not want to oppose their peers’ opinions.19 as with other pedagogical techniques, intentional facilitation with tech tools is necessary to enhance criticality. many authors regard the use of tech tools in the classroom positively.20 examining kahoot! to test students’ performance, darren h. iwamoto et al. state that students valued receiving immediate feedback on their answers after taking a high-stakes examination.21 carolyn m. plump and julia larosa also appreciate the use of instructional games to provide immediate feedback to students, noting that this feedback warned faculty instructors against making assumptions about how much students understand in class.22 similarly, librarian maureen knapp, referring to online tools for active learning, notes that instant feedback drives classroom discussions forward.23 liya deng, a librarian with a focus on disability studies, notes in a 2019 study that using poll everywhere in library instruction provides an opportunity to build rapport with students and a strategy to keep students focused and away from non-instruction-related internet distractions.24 engineering classes have also used tech tools to enhance teaching and learning. a 2021 case study addressing online education due to covid-19 reports that students found kahoot! to be a useful online tool that helped them reflect, apply knowledge, and receive feedback. 25 similarly, engineering educator razzaqul ashahn advocates for incorporating tech tools like jamboard for active “think-pair-share” activities, noting that it enables instructors to connect with students as they do small group work.26 these studies suggest that tech tools continue to be relevant and beneficial during the pandemic, though again, they do not consider whether they are digitally accessible. for this reason, the continued use of tech tools in various modalities (in-person, hybrid, and remote) attests to their relevance, which may continue to grow as instructors transition to pandemic-transformed pedagogy. pandemic-transformed pedagogy in a 2020 publication exploring covid-19 impacts on teaching, learning, and technology use, scholars jillianne code, rachel ralph, and kieran forde coined the phrase “pandemic-transformed pedagogy.”27 as they state, educators find themselves “on the cusp of a rapid change that is compelling them to re-think their worldview in both how they teach and how their students learn, calling for their transformation as educators.”28 a review of the recent literature available through google scholar on “pandemic-transformed pedagogy” shows expanding adoption of this phrase, including academics publishing on a range of interdisciplinary subjects and in international contexts, with implications for both k–12 and post-secondary education.29 as we reflect on this transformation and call for responsiveness to rapid change, we emphasize the need for support, planning, and advocacy for digital accessibility and tech tools. before the covid-19 pandemic, scholars at the university of sydney found in 2018 that the most significant factor driving the choice to use technology was whether it was immediately available. 30 these scholars emphasized the “just in time” use, noting that ready access to the technology required “actions, expenditure, support, and commitment from policymakers and administrators.”31 at the beginning of the pandemic, teachers, librarians, and students had a matter of days to pivot to remote work, and as ibacache, rybin koob, and vance found in a 2021 study, “availability” was a consideration for librarians in selecting tech tools for engagement and content delivery.32 this “just in time” consideration is even more important in the aftermath of covid-19, which prompted emergency remote learning. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 5 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson yet, teaching librarians also ought to go beyond what is easily available and move towards what is digitally accessible. part of the transformation we envision is to extend the concept of “pandemictransformed pedagogy” to include digital accessibility and thus push for the tech tools we use in il instruction to be readily available and digitally accessible. methodology as previously mentioned, this study examines the digital accessibility of five educational tech tools used in il instruction. to initiate a formal accessibility test, we created scripts detailing how to interact with the samples we provided for the five tools.33 these scripts were then used to manually test each tech tool for its digital accessibility using a variety of screen readers on both computers and mobile devices. about the testers the testers are native users of screen reading assistive technology and are blind. they test each tech tool first, with additional staff in the dao reviewing and validating results. about the test scripts process each test script contained the following parameters: 1. basic information about the tool. 2. contact information for access issues and technical questions, such as the tools’ customer support email and librarians’ emails for follow-up questions. 3. access points to the software and websites (urls). 4. step-by-step instructions for testers to impersonate a student engaging in an il task. as a part of these test scripts, we created short sample quizzes and activities for each of the five tools considered in this paper. in addition, the test scripts provided step-by-step descriptions to help the testers interact with the tools. the testers then tried each tool, focusing on functionality and whether they could complete the tasks in the script. the reports describe three levels of problems: severe, significant, and minor. the results section of this paper reports on these problems as found with the five tools tested. the testers also assessed general user experience (usability). the testers used a holistic approach, engaging with the entire virtual environment of the tool rather than looking only at isolated functions. assistive technology the testers utilized four types of screen reader software: voiceover, talkback, nvda, and jaws. voiceover, developed by apple, is a screen reader for mobile devices and computers that “comes standard on every iphone, mac, apple watch, and apple tv. it is gesture-controlled, so by touching, dragging, or swiping[,] users can hear what’s happening on screen and navigate between elements, pages, and apps.”34 talkback is a google-based screen reader included in android mobile devices that functions similarly to voiceover.35 nvda is a microsoft windows-only free open access screen reader supporting people who are blind or have vision impairment.36 jaws, also compatible with microsoft windows, allows people with visual impairment to read the pc screens with a text-to-speech output or via braille display.37 we also tested for visual usability issues using a free web-based color contrast analyzer.38 the testers provided thorough reports detailing the results of their testing, including exact versions used. the tests were conducted between february 27 and may 1, 2022. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 6 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson tools evaluated the educational technology tools in this study are web-based and have free options, allowing students to engage in activities using their computers or their phones. we identified these tools based on a survey about tech tool use during the covid-19 pandemic and from our own experiences.39 the tools are jamboard, kahoot!, mentimeter, padlet, and poll everywhere. jamboard is a google-powered virtual whiteboard tool. kahoot! is a quiz/game platform offering multiple styles; we tested the standard quiz question format and utilized one of the vendor provided sample quizzes. mentimeter is another quiz-making platform; we created the sample quiz utilizing multiple choice and short answer question formats. padlet is a collaborative bulletin board platform with various formats (including the three we tested: padlet maps, padlet shelf, and padlet wall). padlet includes options for users to add text and multimedia in response to question prompts or to post their own questions and other content in a collaborative virtual space. finally, poll everywhere is a polling/survey platform. limitations although digital accessibility offices at different universities commonly rely on shared standards for technology evaluation, such as web content accessibility guidelines (wcag) 2.1, we acknowledge that the assessment approach will vary from office to office. overall, there is much debate on which practices and standards for evaluating tech tools yield the best results. not all higher education institutions have digital accessibility offices, let alone accessibility and usability labs and testers. some institutions may rely on automated checkers or a mix of automated and manual testing. approaches to testing differ, and there is disagreement among digital accessibility practitioners about whether a fully automated, fully manual, or hybrid approach is best. regardless, we expect that manual testing of these educational tech tools using similar assistive technologies would have similar results during the timeframe these tools were tested. the testing reports capture a moment in time, and it’s important to note that web-based tools are frequently updated. we only tested the free versions of these tools. there may be differences in accessibility between free and paid versions. we tested only the browser versions of these tools on computers and mobile devices and did not test mobile applications, which may or may not be more accessible. this decision was made due to the probability that most il librarians and other instructors would not regularly ask students to download applications to their personal devices for in-class engagement. kahoot!, mentimeter, padlet, and poll everywhere were tested on windows, ios, and android platforms. jamboard was tested only on windows, because the browser version would not open on a mobile device using assistive technology. instead, it attempted to force an app download. we also tested each tool using sample environments and functions that we hope captured some ways the tools would be used in a typical il classroom. due to the nature of the tools and the many options available for question and collaboration formats in each tool, these samples were not exhaustive of all options available. these testing results are meant to be illustrative rather than comprehensive. finally, this study evaluated tech tools only for digital accessibility using the specific assistive technology of screen readers. further research is needed regarding how students with a rang e of different disabilities may interact with the technology tools examined here. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 7 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson results this section reports the three levels of problems (severe, significant, and minor) that dao testers found in jamboard, kahoot!, mentimeter, padlet (shelf, map, and wall), and poll everywhere. the testers also assessed user experience (“usability”). issues may be present in multiple categories based on how they impact the user’s ability to complete actions. severe issues table 1 shows the severe issues found in the tools tested. severe issues create access barriers that prevent assistive technology users from completing tasks and are issues that need to be remediated. the testers consider these issues prohibitive for many individuals with disabilities and for those who use assistive technologies. the dao identified ten severe issues in padlet shelf; five severe issues each in jamboard, padlet wall, and poll everywhere; four severe issues in kahoot!, and three severe issues in padlet map. the testers did not find severe issues in mentimeter. table 1 shows that the most common severe issue corresponds to elements that are unlabeled or inappropriately labeled. in the case of padlet map and jamboard, the testers found buttons that were unlabeled or labeled with irrelevant numbers. testers felt unclear as to what the buttons were or what their functionality was. padlet shelf contained the most unlabeled buttons, including the buttons to add posts and the three vertical dots menu to edit or delete. this issue is highly relevant since users need these buttons to navigate and contribute to the padlet. the testers observed a similar problem when using the screen reader talkback to engage with padlet shelf. talkback found unknown or unlabeled buttons, which impede users’ ability to navigate or interact with videos they submit to the padlet. figure 1 illustrates the play button located at the center of a video. in the screen reader, this button is unlabeled and appears after the video, preventing the screen reader from understanding its function and leaving users unclear whether this button is connected to the video. figure 1. the play button at the center of the video is unlabeled. in the reading order, this unlabeled button appears after the video; therefore, it is unclear what it does or how it relates to the video. the second most prevalent severe issue is elements that are not accessible to screen readers. this issue affected padlet shelf and padlet wall. in the case of padlet shelf, the testers utilizing the information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 8 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson voiceover screen reader were unable to interact with or locate gifs and graphics. when the testers utilized talkback, they would hear the gif but could not find the graphics because they were marked as links. in addition, the drawing feature was also not accessible to screen readers, including the visual elements that control colors, which appear as clickable links instead of the visual elements associated with colors. these elements were unavailable for users utilizing voiceover and jaws. the testers found a similar problem with the visual elements in padlet wall, especially when they tried to edit a post (see fig. 2). figure 2. when users want to edit a post in padlet wall, there are visual elements that are available to change the color of the post. these elements are not available to screen readers. figure 3. when images are not programmed to be read as graphics, screen readers are not able to gather information related to the gif. this image was read as “jaf3mi0ja5huk/giphy.” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 9 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 4. while using nvda, the user hears links for images that do not make sense. the third most frequent severe issue relates to graphics and gifs that are not appropriately programmed. this issue affected padlet shelf and padlet wall. when the testers were using jaws in padlet shelf, the gifs read as links with the following text: “jaf3ml0ja5huk/giphy.” when the testers utilized nvda, the gifs read as “giphy,” conveying no information describing the gif and hindering navigation (fig. 3). similarly, graphics and gifs in padlet wall are programmed as links rather than graphics. when the testers used jaws to understand graphics and gifs, they heard long links such as: “eb351cc20e6bfda76d443f1e93ad7963/pumpkin_seedling_3.” long links like this are useless to people using screen readers and disrupt people’s ability to search for graphics. when the testers used nvda, they also heard links for the images, but without the other series of characters included in jaws (fig. 4). the testers also found severe issues with elements not available by keyboard or screen reader (jamboard and poll everywhere) and timer features (kahoot!). for example, the pen, eraser, laser, shapes, and text box elements in jamboard can only be utilized or placed on the screen by a mouse, making them inaccessible to blind learners. another issue is the lack of alternative text. since jamboard offers a collaborative multi-user space, some users may post images. however, there is no way to input alternative text to an image. in the case of kahoot!, when the timer is activated, the countdown plays as the screen reader tries to read the page, confusing the screen reader and the user, who will hear the timer with random numbers and not the question. the timer feature also affects the user when starting a quiz or moving between questions. it is unclear whether the screen reader is unable to read the questions due to the short timeframe or whether the questions are truly unavailable to the screen reader. the instructor may extend the timer for quizzes in kahoot!, but it is impossible to turn it off altogether when using the kahoot! quiz question format. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 10 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 1. number of occurrences of severe issues found during screen reader testing for kahoot!, jamboard, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) severe issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element not available by keyboard or screen reader 2 0 0 0 0 0 1 3 element presents gesture/navigation traps 0 0 0 0 0 0 1 1 elements are not keyboard accessible 0 0 0 1 0 0 0 1 elements are unlabeled or inappropriately labeled 2 0 0 1 4 2 0 9 elements not accessible to screen reader 0 0 0 0 3 2 0 5 errors do not get focus 0 0 0 0 0 0 1 1 graphics and gifs are not programmed appropriately 0 0 0 0 3 1 0 4 graphics are unlabeled or inappropriately labeled 0 0 0 0 0 0 2 2 graphics lack alternative text 1 0 0 0 0 0 0 1 lack of alert 0 0 0 1 0 0 0 1 text not read by screen reader 0 1 0 0 0 0 0 1 timed pages disrupt the ability to read the page 0 3 0 0 0 0 0 3 tool totals 5 4 0 3 10 5 5 32 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 11 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson the testers found other severe issues such as text not being read by screen readers, missing notifications, elements not accessible to screen readers, unlabeled graphics, and lack of focus on images. for example, in the case of kahoot!, the screen reader could not read the answer notification text. this issue meant that while the tool offered visual indicators for correct and incorrect answers, the screen reader did not read these indicators and it remained unclear to testers whether their answers were correct or not. finally, “lack of focus” or challenges with “focus handling” indicate that the assistive technology’s attention was not where it should be. this problem happens because tool developers do not set the appropriate code for screen readers. significant issues table 2 shows the significant issues found in the tools. significant issues represent items that create great difficulty for people who use assistive technologies, but they do not necessarily prevent the tool from being used. significant issues are recommended for remediation. interestingly, most significant issues were not shared across the five tools; out of seventeen problems, only one was shared by four tools (“inconsistent focus handling”), and three were shared by two tools each (“graphics are inappropriately labeled,” “reading order can be confusing to users,” and “state is not indicated”). because of this lack of overlap, brief descriptions of how frequent issues affected specific tech tools are warranted, focusing on those issues that affected multiple tools, recurred most frequently, or both. the significant issue that recurred most frequently was “reading order can be confusing to users,” affecting jamboard as well as all three padlet styles. in jamboard, when creating a sticky note, the focus of the assistive technology went into the edit field but ignored the color options. this meant that users were unable to switch between colors when making a post. reading order also caused difficulties. reading order is the way elements are tagged and read by screen readers. this may not be the same order most sighted users experience when reading elements on the page from top to bottom, though it should closely reflect the visual layout of the page. it determines what a blind learner will understand about the digital environment and in what order. in padlet map, the screen reader went through irrelevant content, including the terms and conditions, before reading the “new post” button. padlet shelf had three instances of confusing reading order; for example, the “publish” and “update” options were in the reading order above the “edit” field. the user would have to know to navigate back to finalize their post (this issue is repeated in padlet wall as well). further, if a user leaves the new post dialog box, it is difficult to return due to the reading order. the “more buttons” element was also read before the heading of a new post, and those additional buttons are unlabeled. finally, in padlet wall, the tester utilizing voiceover could not discard a post (fig. 5). a dialog opened asking for discard confirmation, but this dialog was buried in the reading order and challenging to locate. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 12 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 5. a dialog box appears visually in padlet wall when a user attempts to discard a post, but it is buried in the reading order of the voiceover screen reader, making it difficult to locate and complete the task. the next most frequent significant issue was “inconsistent focus handling,” which occurred six times. focus handling directs the attention of the user and facilitates various actions in a given environment. inconsistent focus handling emerged in four out of the five tools: jamboard, kahoot!, all three padlet styles, and poll everywhere. this issue often appeared when a new element on the screen was opened, but the “focus” (what the screen reader was paying attention to at any given time) remained on the previous panel or element, causing confusion and difficulty. for example, in jamboard, when selecting the “open a jamboard” button, the panel opened visually, but the screen reader’s focus remained on the button behind the open panel. to get to the new jamboard, the tester had to navigate the other page content first. focus handling was inconsistent across many activation buttons and interactions in all three padlet styles. in kahoot!, focus handling was inconsistent across screen readers, with the focus going to different places, such as after answering a question. in poll everywhere, the focus traveled to other areas of the page after answering a question, returning to previous ques tions, or refreshing the page. these inconsistencies varied among screen readers. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 13 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 2. number of occurrences of significant issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) significant issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences difficult combination/list box 0 3 0 0 0 0 0 3 element difficult to access 1 0 0 0 0 0 0 1 element state not indicated 1 0 0 0 0 0 2 3 error does not get focus 0 0 1 0 0 0 0 1 extensive load times create difficulties 0 1 0 0 0 0 0 1 graphics are inappropriately labeled 0 1 1 0 0 0 0 2 graphics not programmed appropriately 1 0 0 0 0 0 0 1 headings are not used 1 0 0 0 0 0 0 1 inconsistent focus handling 1 1 0 1 1 1 1 6 lack of alert 0 0 0 1 0 2 0 3 lack of contextual text/information 0 1 0 0 0 0 0 1 lack of focus indicators 0 0 0 1 0 1 0 2 lack of notification 3 0 0 0 0 0 0 3 object placement 1 0 0 0 0 0 0 1 reading order can be confusing to users 1 0 0 1 3 2 0 7 user-created objects initially lack markup 0 0 0 0 0 1 0 1 tool totals 10 7 2 4 4 7 3 37 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 14 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson another issue that recurred across two tools (kahoot!, mentimeter) was “graphics are inappropriately labeled.” in kahoot!, a graphic showing the final scoreboard from the quiz and a podium were hidden from screen readers. in mentimeter, the logo for the tool is labeled as a “logotype” in the alt text. while these inappropriate labels for graphics may seem minor, they leave players using assistive technology out of celebratory or fun elements and may be confusing. inappropriate labels cannot be corrected by instructors, who are unable to adjust the alt text for elements that are built into the software. another issue that recurred was “state is not indicated” (jamboard, poll everywhere). here, “state” refers to any change or option for an element—the state of it in the digital environment. in jamboard, there is no indication of what color is selected for sticky notes, for example, which can be problematic if instructors use color to convey meaning (fig. 6). in one test question on poll everywhere, the unlabeled image reads as clickable to nvda, and visually, the image becomes larger when clicked. but this change is not announced and again may be confusing. figure 6. for screen readers, there is not an indication of what color has been selected for sticky notes, though this is available visually. with ten issues listed, jamboard was the tool with the greatest number of significant problems. this was true even though jamboard was tested only on windows and not on mobile devices. padlet wall and kahoot! had seven significant issues each. this is a slight departure from the data in table 1, where padlet shelf had the most severe issues. in general, tools with severe issues consistently exhibited some significant issues as well. figure 6. users can enter text on option 1 and option 2, but these options do not generate a heading. minor issues table 3 shows the minor issues found in the five technology tools. minor issues represent items that are inconvenient or annoying, but do not necessarily create barriers to accessibility, e.g., repetitiveness, unclear text, etc. the testers found that each tool had between one and four minor issues of their own but did not share any of the minor issues listed. kahoot! had three issues related to confusing elements: gibberish text heard on screen readers, blanks in the statement not read by screen readers, and an icon that shows the total number of users who finished a test, which the screen reader could not read. other minor issues include instructions, questions, and information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 15 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 3. number of occurrences of minor issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) minor issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element is inappropriately labeled 0 0 1 0 0 0 0 1 elements confusing to users 0 3 0 0 0 0 0 3 elements read twice 0 0 0 0 1 0 0 1 heading level not concise 0 0 0 0 0 1 0 1 headings are not used to provide structure 0 0 0 1 0 0 0 1 headings used too often 0 0 0 0 1 0 0 1 inconsistent focus handling 0 0 1 0 0 0 0 1 labels are inconsistent 1 0 0 0 0 0 0 1 lack of a programmatic list creates confusion 0 0 0 0 0 1 0 1 lack of notification 0 0 0 0 0 1 0 1 same information is presented to screen reader multiple times 0 0 0 0 0 0 1 1 sound effect portray meaning 0 1 0 0 0 0 0 1 submenu item count not provided 1 0 0 0 0 0 0 1 unclear text is confusing to user 0 0 0 0 1 1 0 2 tool totals 2 4 2 1 2 3 1 17 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 16 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson answers announced multiple times (poll everywhere), long heading text (padlet wall), lack of notification when the user adds an image (padlet wall), excessive use of headings in a page forcing users to go through the entire page to find a heading (padlet shelf), and headings not used to provide structure and facilitate navigation. the lack of heading structure complicates the ability of users who desire to add a post with a heading and text as seen in figure 6 (padlet map). usability issues the accessibility assessment reports also included usability evaluation. usability issues may impact users of any ability. the testers noticed insufficient color contrast in three tools (poll everywhere, padlet wall and map, and kahoot!). for example, figure 7 illustrates a poll everywhere sample question where the color of the text does not have enough contrast between the text and the background. the evaluators also found a lack of color contrast in instructions and captions. in some padlet formats, the instructor can change color contrast by choosing a different template. figure 7. an example of poll everywhere answer options that do not have sufficient color contrast between the text and background. conveying meaning using colors is another issue. in the case of jamboard, the sticky note (fig. 8) has a blue bar at the bottom that appears to be loading. this bar is connected to a character count that is not noted by screen readers. in addition, the testers could continue typing past the character limit when the loading bar turned red. the testers also noticed layered elements that caused usability problems. figure 9 illustrates how the preview panel in padlet map visually blocks the post and the button to close the preview panel. padlet shelf and mentimeter did not have usability issues. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 17 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 8. in jamboard, the blue bar (below the yellow box) is used as a visual indication of character limit that is not available to screen readers. figure 9. when the user has the “preview panel” in padlet map open and starts a new post, the preview panel blocks the post. summary of results the reports showed that mentimeter was the most digitally accessible tool of those considered in this study. it is important to note that kahoot! and poll everywhere were judged as relatively accessible with caveats. both jamboard and all three types of padlet tested were found to be inaccessible for many individuals who use assistive technologies. in any case, all tools included either severe or significant issues, creating a great deal of difficulty for users. most issues were unique to individual tools. of twelve severe issues, only two were shared across two tools each. of sixteen significant issues, only four were shared across tools, and only one was shared across more than two tools (“inconsistent focus handling” was a problem in all tools except mentimeter). all fourteen minor issues were unique. mentimeter, with very few issues, and padlet, with the most problems in aggregate, were outliers. because padlet offers so many different format options, we tested three, which affected the findings. still, padlet shelf had the most severe issues (ten). jamboard had the most significant issues (ten) of any tool aside from padlet. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 18 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson discussion we hypothesized that the tools selected for this study shared similar digital accessibility issues. to our surprise, the reports showed that these tech tools had few shared problems. we will thus consider two trends in the reports worthy of further examination: shared issues among the tools and unique environments. we will also discuss how digital accessibility can benefit all users of tech tools, not only people with disabilities. shared issues among the tools as previously mentioned, many issues were unique to individual tech tools covered in this study, while a few problems were shared among the tools. when a particular issue was shared among different tools, the level of severity determined whether a person using assistive technology could have a successful interaction with a tool or not. tracking shared issues and their severity may help developers and digital accessibility staff create a shared vocabulary for discussing user experience. it may also help both parties recognize when issues are common and relatively easy to remediate (e.g., labeling, heading, and alt text problems). there are other shared issues, such as focus handling inconsistencies, that are more difficult to resolve even though they are at the heart of screen reading assistive technology. tracking focus handling problems may allow developers and digital accessibility advocates to share possible solutions with one another. moreover, if tech tool developers and digital accessibility staff both understand the importance of a factor like focus handling, any difficult and severe problem can be prioritized when creating and fixing tech tools. it is also important for instruction librarians to have a basic grasp of this shared vocabulary so that they can anticipate the needs and experiences of the learners in their classrooms. looking at each tech tool in isolation offers only a tiny glimpse into the possibilities of what might happen when students connect to engagement technologies. evaluating multiple tools allowed us to better understand recurring problems and the barriers they create. unique environments though tracking shared issues is important for these reasons, by the end of our testing, we found that the tools were not similar and that even when they had shared issues, these problems had unique characteristics. for this study, we selected tools that have similar functionality (for example, both kahoot! and mentimeter can function as quiz platforms) and others that are distinctive (such as padlet maps, which incorporates gps data to allow users to interact with maps). these tools offer students real-time engagement, which helps foster a collaborative learning environment. as mentioned above, most severe issues (ten out of twelve), most significant issues (twelve out of sixteen), and all minor issues were unique—in other words, they were not shared across tools. from the testers’ point of view, the presence of unique problems is understood by the fact that the elements of each tool combine to create unique environments. for example, some tech tools are more image based, while others are text based.40 our study shows that even tools that initially appear similar are revealed as unique through assistive technology testing. an interesting finding concerned padlet. when tools have problems, these issues usually exist across all of the screen readers used for testing. padlet, however, caused inconsistencies across information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 19 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson different screen readers. for example, padlet shelf had many unlabeled or inappropriately labeled elements that created different experiences between voiceover, nvda, and jaws. moreover, when irregularities appear across similar assistive technologies, this might mean that developers created unusual code in order to facilitate specific visual elements or other aspects of the technology. developers should consider testing on multiple platforms and with two or more screen readers to catch these inconsistencies and should also consider whether simple html alternatives are possible in place of more complicated code. regarding padlet, it may be possible that software developers used accessible rich internet application code (aria-code), which is known to cause inconsistencies for assistive technology. whenever possible, user experience should be consistent across screen readers. users should never be asked to switch assistive technologies in order to adapt to a tech tool. although we sampled only five tech tools, when considering the breadth of other tools in the market and those that may not yet be developed, we wonder whether our results could indicate an abundance of unique environments with unique digital accessibility problems. this inference suggests that software developers may not be creating tech tools with digital accessibility in mind or may be testing with only one type of screen reader. it also speaks to the lack of digital accessibility best practices in software development for educational tech tools. if anything, our results also illustrate the complexity of tech tool environments and the nuances of assistive technology. digital accessibility benefits users with different abilities digital accessibility is valuable for everyone, not just people with disabilities. two specific values illustrate this comprehensive benefit. first, if standards for digital accessibility are followed, digital content will be more “portable across platforms, browsers, and operating systems.”41 this interoperability could mean that learning content and properly formatted tech tools will be easy to use across assistive technologies and devices such as smartphones. secondly, accessible features benefit people who do not see themselves as having a disability. 42 for example, covid-19 amplified the benefits of using captioning for all learners, even when these learners did not have a specific disability.43 a 2004 microsoft survey also inferred that accessibility features benefit a wide variety of people.44 while a person with a disability benefits from clear organization, headings, labels, and color contrast, those aspects are also helpful for all users. recommendations and next steps planning with intention teaching librarians need to invest time learning about the environment of a tech tool they decide to use in il instruction. sometimes, tech tools that are digitally accessible are not easy or intuitive for instructors to use. we experienced this “easy to use” versus digital accessibility conflict when preparing the scripts for mentimeter and padlet. padlet is used extensively at our institution due perhaps to its instructor-friendly platform. however, padlet’s wall, shelf, and map assessments revealed many problems with digital accessibility. additionally, we had a difficult time creating a quiz in mentimeter, finding this platform unfriendly for the instructor; yet this tool had the fewest digital accessibility problems. this tension between ease of use and digital accessibility illustrates the importance of taking time to read and understand documentation and training materials before creating engagement activities for il sessions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 20 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson we encourage teaching librarians to work with their local digital accessibility offices to evaluate the technology used most frequently in il classrooms. if a digital accessibility office does not exist on your campus, you may wish to advocate for your administration to create one. even if your institution does not yet have a digital accessibility office, there are ways for librarians to plan their il sessions with accessibility in mind. librarians may do basic assessments of tech tools without access to assistive technology by testing whether it is possible to access all features in a given tool using the tab keyboard key. if there is a function or action that you cannot access with tab or that you must use a mouse to navigate to, then that part of the tool will not be accessible to someone using a screen reader. you can also unplug or turn off the mouse and attempt using a tech tool. librarians can approach each tech tool by asking: is there anything in the tool that uses only images or colors? do sounds convey a meaning that is not otherwise communicated on the screen? if there is anything in the tool that relies on a single form of sensory feedback, it may be unperceivable to people using assistive technology. finally, we strongly suggest considering whether these tools add value to il instruction. if you like a certain tool but know it is inaccessible (or you are unsure), consider trying a different way of involving students in the same kind of engagement. think about simplifying the tech tools that you do use. extend or turn off timers where possible if you choose to use kahoot!, mentimeter, or other quiz-making tools. avoid using questions on any platform that require users to engage with images, even if alt text is provided, because they tend to be more difficult for screen readers. pursue documentation and take time to understand various options for each tool and each question, then weigh which option will be most accessible for most students. it takes time and energy to plan ahead with intention but increasing the ability of all students to engage in learning makes the planning process worthwhile. collaboration if collaboration between librarians and digital accessibility experts is possible on your campus, take the time to talk to one another about learning outcomes and reasons for using specific tech tools. consult with experts in digital accessibility who can also help you advocate for accessibility clauses in purchase contracts before agreeing to subscribe to a given tool or service. you may also foster collaboration with an inclusive community of practice if you have one at your library. further, the teaching and learning unit on your campus may offer support for integrating technology with pedagogy to promote the engagement and learning experience of all students attending il instruction. this collaboration may be impactful for the library and the campus teaching community. as librarians with teaching responsibilities, we usually do not work in isolation. instruction librarians can also serve as a resource for teaching faculty who may want to incorporate accessible tech tools into their instruction. in addition, librarians could investigate professional organizations that provide support and development in understanding digital accessibility. while a framework for assessing tech tools for accessibility does not currently exist, the development of standards and best practices would be beneficial for librarians, software developers, and accessibility professionals alike. we hope to undertake future research and consultation to develop such frameworks with colleagues, possibly through ala round tables or acrl sections focused on instruction and accessibility. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 21 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson next steps our next steps include sharing these reports with the companies who created the five tools we tested. we will ask them to prioritize both the most severe issues and those issues that are easy to fix and that impact user experience. we will also underscore those areas that surprised us, such as inconsistencies between screen readers for the same issue in a given environment. the goal of this outreach will be to build relationships with tech tool developers so that continued dialogue and testing can occur. the ultimate goal is a more accessible learning environment for everyone with technology vendors as partners in this journey. conclusion advocating for digital accessibility in research libraries requires relationship and capacity building. the challenges faced during emergency remote learning illustrate the necessity of campus units working together to ensure student inclusion and success. increased collaborations between academic libraries, tech tool developers, and digital accessibility offices mean that all parties can benefit from mutual expertise. librarians may share the kinds of tech tools being used in il sessions, while accessibility offices may test those tools and provide recommendations for improvement, which may then be leveraged when working with software companies to advocate for positive change. if more people are aware of digital accessibility vocabulary, needs, and resources across campus, that can also augment the number of people available to respond to and triage needs when future emergencies arise. acknowledgment we would like to thank scott holman and eric klinger from the cu boulder writing center for their help revising this manuscript. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 22 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix a: tester instruction and script background poll everywhere is an online platform used in classrooms to engage with students through questions, surveys, and polls. people can sign up for a free account or for one of four subscription based account options. the free account allows users to create unlimited questions, have access to webinar tutorials, and upload images as question choices. this tool also allows people to respond via browser, sms, or app; to export data and screenshots; and to share to social networks, though some of these features are limited with a free account. poll everywhere script 1. type in your browser or click on the link provided. a pop-up might show on your screen. agree to the cookie policy if it does. 2. you may be prompted to introduce yourself and enter the screen name you would like to appear alongside your responses. 3. click continue. the survey will let you know that there are six questions. click start survey. 4. the first question is multiple choice. select your favorite sport. 5. click next on the upper right-hand corner. 6. the second question is a short response. type your favorite ice cream flavor. click submit. you can enter as many answers as you want. when you are ready to go to the next question click next. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 23 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 7. the third question is also a short response. type in your favorite food. you can enter as many answers as you want. when you are ready to go to the next question click next. 8. the fourth question is also a short response. type what you are looking forward to this semester. you can enter as many answers as you want. when you are ready to go to the next question click next. 9. the fifth question is a clickable image question. click on the face that describes how you are feeling today. for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click next. 10. the sixth question is a ranking question. you need to use the arrow feature, which appears when you click next to the image. move images up and down organizing them from favorite to least favorite. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 24 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click submit. 11. click finish in the upper right-hand corner. a screen will appear that says “all done!” the results of the survey are only available when the creator of the survey presents them in class. we were not able to figure out a way for respondents to access group responses asynchronously. notes • we noticed that when preparing questions 4 and 6, we were not prompted to enter alt-text by default. • the creator of the poll must enable alt-text for clickable image questions (such as question 4) by going to the user profile and selecting “features lab.” • alt-text did not seem to be available for question 6. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 25 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix b: digital accessibility assessment report for poll everywhere information • testing tools: o windows 10 / jaws 2021/ nvda 2021.3.1 / google chrome (most recent version) o pixel 3a, android 12 / talkback / last updated 9/30/21 o iphone 12 mini / voiceover / safari ios 14.3 • testing dates: february 27, 28, and march 3, 2022 summary this document provides an overview of the issues the digital accessibility office (dao) identified on the poll everywhere platform. overall, we found the site to be relatively accessible for many individuals with disabilities or who use assistive technologies or alternative forms of access depending on the question type. the questions with images—to rank or select—were inaccessible. that said, through our testing, we found five severe issues, two significant issues, one minor issue, and one usability issue. severe issues represent items that create access barriers and need to be remediated, significant issues represent items that create a great deal of difficulty and should be remediated, and minor issues represent items that are the lowest priority but would be good to remediate. usability issues can impact users of any ability. if there are questions, concerns, or the desire to see demos of the issues presented in this report, please reach out to the assessment & usability testing team. please also consider filling out the assessment & usability testing feedback form to help us improve our testing protocols. issues severe graphics are unlabeled or inappropriately labeled • in question 6, there are four pictures of animals. the screen readers read all four images as “unlabeled images.” there is no differentiation between the four images. appropriate image descriptions are needed. o additionally, while reviewing the history of submissions, the answers are a list that read “(an image), (an image), (an image), (an image)” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 26 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson • there were several elements that have dots in the label name. when using voiceover, these elements were read as “unpronounceable. [braille dots] ...” followed by numbers. o these elements included the marker on the emoji image, the up and down arrow buttons on question 6, and the finished icon. element presents gesture/navigation traps • on question 6, while using voiceover and talkback, the user could not swipe between the answer options. this made the buttons, links, and text before and after the options inaccessible. o a tester was able to leave the trap, but they had to use direct touch and focus landed outside the answers. o additionally, while using talkback, there was not any indication that the image was being moved up or down. element not available by keyboard or screen reader • question 5 is an emoji question where the user would need a mouse or direct touch (while not using a screen reader) to answer successfully. the alternative text says there are emojis, but the user does not know what five emotes or different colors are presented. to activate, the user selects “enter” or double taps (mobile screen reader). this makes a random selection and places the marker in the middle of the image without a way to move the marker to the appropriate emoji. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 27 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson errors do not get focus • during one instance, a user received an error that the response was not submitted. during this one instance, the focus was not pushed to the error message. the user would have to know it was there. ideally, the focus would be pushed to the error so all users would be aware that an error had occurred. significant element state not indicated • in question 6, the unlabeled image reads as “clickable” to nvda. when selecting enter, the state of the element is not announced. visually, the image gets larger. inconsistent focus handling • focus handling for all tools could be improved. focus goes to different areas of the page after responses, returning to previous questions, or refreshing the page. o focus inconsistencies depended on the screen reader. while going through the questions, focus would go to the top of the page, “close app download offer” button, “submit,” or “next.” ideally, focus would be on the heading 1 for each question. minor same information is presented to screen reader multiple times • while using voiceover, the instructions, questions, and answers were announced multiple times. this was noted on several occasions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 28 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson usability insufficient color contrast (4.5:1) • in the multiple-choice question, after selecting an answer, the question’s color becomes lighter. the lighter color has insufficient color contrast for both the answer selected (2:1) and the answers not selected (1.8:1). information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 29 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson endnotes 1 kathia ibacache et al., “emergency remote library instruction and tech tools: a matter of equity during a pandemic,” information technology and libraries 40, no. 2 (2021): 8, https://doi.org/10.6017/ital.v40i2.12751. 2 there are many examples of guides which include ada compliance and library spaces, including but not limited to william w. sannwald, checklist of library building design considerations, 6th ed. (chicago: ala editions, an imprint of the american library association, 2016); and carrie scott banks et al., including families of children with special needs: a how-to-do-it manual for librarians (chicago: american library association, 2014). 3 “introduction to the ada,” ada.gov: united states department of justice civil rights division, accessed june 15, 2022, https://www.ada.gov/ada_intro.htm. 4 “introduction to the ada.” 5 ariel pomputius, “assistive technology and software to support accessibility,” medical reference services quarterly 39, no. 2 (2020): 203, https://doi.org/10.1080/02763869.2020.1744380. 6 fernando h. f. botelho, “accessibility to digital technology: virtual barriers, real opportunities,” assistive technology 33, no. s1 (2021): s31, https://doi.org/10.1080/10400435.2021.1945705. 7 botelho, “accessibility to digital technology,” s27. 8 jonathan lazar, “planning for digital accessibility in research libraries,” research libraries issues, no. 302 (2021): 20, https://doi.org/10.29242/rli.302.3. 9 “digital accessibility,” georgetown law, accessed june 15, 2022, https://www.law.georgetown.edu/your-life-career/campus-services/information-systemstechnology/digital-accessibility/. 10 lazar, “planning for digital accessibility,” 21. 11 lazar, “planning for digital accessibility,” 19. 12 lazar, “planning for digital accessibility,” 26–28. 13 education of individuals with disabilities, 20 u.s.c. §§ 1400–1485 (suppl. 2 1988), https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988032020033/uscode1988-032020033.pdf; see also kathleen puckett and kim w. fisher, “assistive technology,” in the sage encyclopedia of intellectual and developmental disorders, ed. ellen b. braaten (thousand oaks, ca: sage publications, inc., 2018), 100 –101. 14 puckett and fisher, “assistive technology,” 100. 15 claire kearney-volpe and amy hurst, “accessible web development: opportunities to improve the education and practice of web development with a screen reader,” acm transactions on accessible computing 14, no. 2 (july 21, 2021): 8:2. https://doi.org/10.6017/ital.v40i2.12751 https://www.ada.gov/ada_intro.htm https://doi.org/10.1080/02763869.2020.1744380 https://doi.org/10.1080/10400435.2021.1945705 https://doi.org/10.29242/rli.302.3 https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 30 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 16 emily k. chan and lorrie a. knight, “‘clicking’ with your audience: evaluating the use of personal response systems in library instruction,” communications in information literacy 4, no. 2 (march 1, 2011): 192–201, https://doi.org/10.15760/comminfolit.2011.4.2.96. 17 referring to the use of padlet to foster collaboration in a statistics course, henrik skaug saetra suggests that students feel more welcome to ask basic questions in an anonymous environment, in “using padlet to enable online collaboration mediation and scaffolding in a statistics course,” education sciences 11, no. 5 (2021), 6, https://eric.ed.gov/?id=ej1297247. christopher j. e. anderson notes that anonymity invites classroom discussion participation for introverted students and states that answers can be reviewed without “requiring participants to reveal their choice, thus removing stigmas that keep many introverted students from orally participating,” in “repurposing digital devices: using poll everywhere as a vehicle for classroom participation,” journal of teaching and learning with technology 7, (2018): 154, https://eric.ed.gov/?id=ej1307006. 18 citing a 2010 article by b. jean mandernach and jana hackatborn, jared hoppenfeld states that anonymity provides a means for students to answer honestly, in “keeping students engaged with web-based polling in the library instruction session,” library hi tech 30, no. 2 (2012): 238, https://doi.org/10.1108/07378831211239933. see also anderson, “repurposing digital devices,” 154. 19 this paper considered pedagogical approaches when using padlet in the classroom, noting that this tool did not enhance criticality or students’ desire to counter a post by a classmate; see ann rosnida md deni and zainor izat zainal, “padlet as an educational tool: pedagogical considerations and lessons learnt,” proceedings of the 10th international conference on education technology and computers (october 2018), 157, https://doi.org/10.1145/3290511.3290512. 20 some authors surmise that an instructional game could be used to prepare students for exams, for example, patricia a. baszuk and michele l. heath, “using kahoot! to increase exam scores and engagement,” journal of education for business 95, no. 8. (2020): 550, https://doi.org/10.1080/08832323.2019.1707752. examining technology as a tool to enhance teaching and learning in engineering classes, vian ahmed and alex opuku mentioned that students found kahoot! a useful online tool that helped them reflect, apply knowledge, and receive feedback, in “technology supported learning and pedagogy in times of crisis: the case of covid‐19 pandemic,” education and information technologies 27 (2021), https://doi.org/10.1007/s10639-021-10706-w. darren h. iwamoto et al. assert that kahoot! provided students with a fun activity that helped them memorize important concepts, in darren h. iwamoto, jace hargis, erik jon taitano, and kv vuong, “analyzing the efficacy of the testing effect using kahoot! on student performance,” turkish online journal of distance education 18, no. 2 (2017): 83, 89, https://eric.ed.gov/?id=ej1145220. 21 iwamoto et al., “analyzing the efficacy,” 83, 89. 22 carolyn m. plump and julia larosa, “using kahoot! in the classroom to create engagement and active learning: a game-based technology solution for elearning novices,” management teaching review 2, no. 2 (2017): 156, https://doi.org/10.1177/2379298116689783. https://doi.org/10.15760/comminfolit.2011.4.2.96 https://eric.ed.gov/?id=ej1297247 https://eric.ed.gov/?id=ej1307006 https://doi.org/10.1108/07378831211239933 https://doi.org/10.1145/3290511.3290512 https://doi.org/10.1080/08832323.2019.1707752 https://doi.org/10.1007/s10639-021-10706-w https://eric.ed.gov/?id=ej1145220 https://doi.org/10.1177/2379298116689783 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 31 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 23 maureen knapp, “technology for one-shot instruction and beyond,” journal of electronic resources in medical libraries (2014): 224, https://doi.org/10.1080/15424065.2014.969969. 24 liya deng, “assess and engage: how poll everywhere can make learning meaningful again for millennial library users,” journal of electronic resources librarianship 31, no. 2 (2019): 63, https://doi.org/10.1080/1941126x.2019.1597437. 25 a surveyand interview-based study by engineering faculty members ahmed and opoku examined both instructors’ and students’ perceptions of technology-supported learning during times of crisis. with regard to technological and pedagogical best practices, student participants noted that interactive feedback tools such as kahoot! helped them synthesize and apply their knowledge. as one student said, “kahoot! was a fun and interactive application and engaging.” see ahmed and opuku, “technology supported learning and pedagogy,” 381. 26 razzaqul ahshan, “a framework of implementing strategies for active student engagement in remote/online teaching and learning during the covid-19 pandemic,” education sciences 11, no. 9 (2021): 487, https://doi.org/10.3390/educsci11090483. 27 jillianne code, rachel ralph, and kieran forde, “pandemic designs for the future: perspectives of technology education teachers during covid-19,” information and learning sciences 121, no. 5/6 (january 1, 2020): 419–31, https://doi.org/10.1108/ils-04-2020-0112. 28 code, ralph, and forde, “pandemic designs,” 426. 29 one such study examines the impact of covid-19 on higher education in ethiopia; see berhanu abera, “the effects of covid-19 on ethiopian higher education and their implication for the use of pandemic-transformed pedagogy: ‘corona batches’ of addis ababa university in focus,” journal of international cooperation in education 24, no. 2 (2021): 3–25. another study focuses on polish primary school integration of ipads; see lucyna kopciewicz and hussein bougsiaa, “understanding emergent teaching and learning practices: ipad integration in polish school,” education and information technologies 26, no. 3 (2021): 2916, https://doi.org/10.1007/s10639-020-10383-1. a third article explores pandemic-transformed pedagogy from the perspectives of early childhood instructors in the caribbean; see sabeerah abdul-majied, zoyah kinkead-clark, and sheron c. burns, “understanding caribbean early childhood teachers’ professional experiences during the covid-19 school disruption,” early childhood education journal (2022), https://doi.org/10.1007/s10643-022-01320-7. 30 paul f. burke et al., “exploring teacher pedagogy, stages of concern and accessibility as determinants of technology adoption,” technology, pedagogy and education 27, no. 2 (2018): 149–63, https://doi.org/10.1080/1475939x.2017.1387602. 31 burke et al., “exploring teacher pedagogy,” 158–59. 32 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 33 the authors of this study hold roles as academic subject specialist librarians and digital accessibility office staff, including accessibility and usability team testers. https://doi.org/10.1080/15424065.2014.969969 https://doi.org/10.1080/1941126x.2019.1597437 https://doi.org/10.3390/educsci11090483 https://doi.org/10.1108/ils-04-2020-0112 https://doi.org/10.1007/s10639-020-10383-1 https://doi.org/10.1007/s10643-022-01320-7 https://doi.org/10.1080/1475939x.2017.1387602 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 32 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 34 “free accessibility tools and assistive technology you can use today,” bureau of internet accessibility (blog), october 26, 2018, https://www.boia.org/blog/free-accessibility-tools-andassistive-technology-you-can-use-today; see also “chapter 1, introducing voiceover,” in voiceover getting started guide, apple, inc., accessed june 16, 2022, https://www.apple.com/voiceover/info/guide/_1121.html. 35 “get started on android with talkback,” android accessibility help, accessed june 16, 2022, https://support.google.com/accessibility/android/answer/6283677?hl=en. 36 “about nvda,” nv access, accessed june 17, 2022, https://www.nvaccess.org/about-nvda/. 37 jaws was developed by freedom scientific members with vision loss; see “jaws®—freedom scientific,” accessed june 16, 2022, https://www.freedomscientific.com/products/software/jaws/. 38 “colour contrast analyser (cca),” tpgi, accessed june 16, 2022, https://www.tpgi.com/colorcontrast-checker/. 39 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 40 jamboard is very visual, with multiple options such as sticky notes, drawing pens, and image searching. other tools such as kahoot! and mentimeter are not solely visual; they also include additional moving parts, such as sounds and other notifications. 41 lazar, “planning for digital accessibility,” 21. 42 lazar, “planning for digital accessibility,” 21. 43 lazar indicated that captioning benefits people who process information in different ways, who are learning the language being used, or who may otherwise struggle to understand a dialect, in “planning for digital accessibility,” 21. 44 forrester research, inc., accessible technology in computing: examining awareness, use, and future potential, redmond, wa: microsoft corporation (2004): 9, http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b1f33d25fd40f/researchreport-phase2.doc. https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.apple.com/voiceover/info/guide/_1121.html https://support.google.com/accessibility/android/answer/6283677?hl=en https://www.nvaccess.org/about-nvda/ https://www.freedomscientific.com/products/software/jaws/ https://www.tpgi.com/color-contrast-checker/ https://www.tpgi.com/color-contrast-checker/ http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc abstract introduction overview ada accessibility what is digital accessibility? assistive technology for blind users classroom engagement with tech tools pandemic-transformed pedagogy methodology about the testers about the test scripts process assistive technology tools evaluated limitations results severe issues significant issues minor issues summary of results discussion shared issues among the tools unique environments digital accessibility benefits users with different abilities recommendations and next steps planning with intention collaboration next steps conclusion acknowledgment appendix a: tester instruction and script background poll everywhere script notes appendix b: digital accessibility assessment report for poll everywhere information summary issues severe graphics are unlabeled or inappropriately labeled element presents gesture/navigation traps element not available by keyboard or screen reader errors do not get focus significant element state not indicated inconsistent focus handling minor same information is presented to screen reader multiple times usability insufficient color contrast (4.5:1) endnotes eclipse editor for marc records bojana dimić surla information technology and libraries | september 2012 65 abstract editing bibliographic data is an important part of library information systems. in this paper we discuss existing approaches in developing user interfaces for editing marc records. there are two basic approaches: screen forms that support entering bibliographic data without knowledge of the marc structure, and direct editing of marc records shown on the screen. this paper presents the eclipse editor, which fully supports editing of marc records. it is written in java as an eclipse plug-in, so it is platform-independent. it can be extended for use with any data store. the paper also presents a rich client platform (rcp) application made of a marc editor plug-in, which can be used outside of eclipse. the practical application of the results is integration of the rcp application into the bisis library information system. introduction an important module of every library information system (lis) is one for editing bibliographic records (i.e., cataloguing). most library information systems store their bibliographic data in a form of marc records. some of them support cataloging by direct-editing of marc record; others have a user interface that enables entering bibliographic data by a user who knows nothing about how marc records are organized. the subject of this paper is user interfaces for editing marc records. it gives software requirements and analyzes existing approaches in this field. as the main part of the paper, we present the eclipse editor for marc records, developed at the university of novi sad, as a part of the bisis library information system. eclipse uses the marc 21 variant of the marc format. the remainder of this paper describes the motivation for the research, presents the software requirements for cataloging according to marc standards, and provides background on the marc 21 format. it also describes the development of the bisis software system, reviews the literature concerning tools for cataloging, and analyzes existing approaches in developing user interfaces for editing marc records. the results of the research are presented in the final section, which describes the functionality and technical characteristics of the eclipse marc editor. the rich client platform (rcp) version of the editor, which can be used independently of eclipse, is also presented. motivation the motivation for this paper was to provide an improved user interface for cataloging by the marc standard that will lead to more efficient and comfortable work for catalogers. bojana dimić surla (bdimic@uns.ns.ac.yu) is an associate professor, university of novi sad, serbia. eclipse editor for marc records |surla 66 there are two basic approaches in developing user interfaces for marc cataloging. the first approach includes using a classic screen form made of text fields and labels with the description of the bibliographic data, without marc standard indication. the second approach is direct editing of a record that is shown on the screen. those two approaches will be discussed in detail in “existing approaches in developing user interfaces for editing marc records” below. the current editor in the bisis system is a mixture of these two approaches—it supports direct editing, but data input is done via text field, which opens on double click.1 the idea presented in this paper is to create an editor that overcomes all drawbacks of previous solutions. the approach taken in creating the editor was direct record-editing with real-time validation and no additional dialogs. software requirements for marc cataloging the user interface for marc cataloging needs to support following functions: • creating marc records that satisfy constraints proposed by the bibliographic format • selecting codes for field tags, subfield names, and values of coded elements, such as character positions in leader and control fields, indicators, and subfield content • validating entered data • access to data about the marc format (a “user manual” for marc cataloging) • exporting and importing created records • providing various previews of the record, such as catalog cards background marc 21 as was previously mentioned, the eclipse editor uses the marc 21 variant. marc 21 consists of five formats: bibliographic data, authority data, holdings data, classification data, and community information.2 marc 21 records consist of three parts: record leader, set of control fields, and set of data fields. the record leader content, which follows the ldr label, includes the logical length of the record (first five characters) and the code for record status (sixth character). after the record leader, there are control fields. every control field is written in new line and consists of the threecharacter numeric tag and content of the control field. the content of the control field can be a single datum or a set of fixed-length bibliographic data. control fields are followed by data fields in the record. every line in the record that contains a data field consists of a three-character numeric tag, the value for the first and the second indicator—or the number sign (#) if indicators are not defined for the field—and the list of subfields that belong to the field. information technology and libraries | september 2012 67 detailed analysis of marc 21 shows that there are some constraints on the structure and content of the marc 21 record. constraints on the structure define which fields and subfields can appear more than once in the record (i.e., are the fields and subfields repeatable or not), the allowed length of the record elements, and all the elements of the record defined by marc 21. constraints on the record content are defined on the content of the leader, indicators, control fields and subfields. moreover, some constraints connect more elements in the record (when the content of one element depends on the content of the other element in the record). an example of constraint on the structure for data field 016 is that the field has the first indicator whereas the second indicator is undefined. the field 016 can have subfields a, z, 2, and 8, of which z and 8 are repeatable. bisis the results presented in this paper belong to the research on the development of the bisis library information system. this system, which has been in development since 1993, is currently in its fourth version. the editor for cataloging in the current version of bisis was the starting point for the development of eclipse, the subject of this paper. 3 apart from an editor for cataloging, the bisis system has a module for circulation and an editor for creating z39.50 queries.4 the indexing and searching of bibliographic records was implemented using the lucene text server.5 as a part of the editor for cataloging, we developed the module generating various reports and catalog cards from marc records.6 bisis also supports creating an electronic catalog of unimarc records on the web, where the input of bibliographic data can be down without knowing unimarc but the entered data are mapped to unimarc and stored in the bisis database.7 the recent research within the bisis project relates to its extension for managing research results at the university of novi sad. for that purpose, we developed the current research information system (cris) on the recommendation of the nonprofit organization eurocris.8 the paper “cerif compatible data model based on marc 21 format” gives the proposal for the common european research information format (cerif), a compatible data model based on marc 21. in this model, a part of the cerif data model that relates to research results is mapped to marc 21. furthermore, on the basis of this model, research management at the university of novi sad was developed.9 the paper “cerif data model extension for evaluation and quantitative expression of scientific research results” explains the extension of cerif for evaluation of published scientific research. the extension is based on the semantic layer of cerif, which enables classification of entities and their relationships by different classification schemas.10 the current version of the bisis system is based on a variant of the unimarc format. the development of the next version of bisis, which will be based on marc 21, is in progress. the first task was migrating existing unimarc records.11 the second task is developing the editor for marc 21 records, which is the subject of this paper. eclipse editor for marc records |surla 68 cataloging tools an editor for cataloging is a standard part of a cataloger’s workstation and the subject of numerous studies. lange describes the cataloging development process from handwritten cataloging cards, to typewriters (first manual then electronic), to the appearance of marc records and pc-based cataloger’s workstations.12 leroya and thomas debate the influence of web development on cataloging. they stress that the availability of information on the web, as well as the possibility that more applications can be opened in the same time in different windows, greatly influence the process of creating bibliographic records. their paper also indicates that there are some problems that result from using large numbers of resources from the web, such as errors that arise from copy-paste methods. consequently, there is a need for automatic check of spelling errors and the possibility of a detailed review by a cataloger during editing.13 khurshid deals with general principles of the cataloger’s workstation, its configuration, and its influence on a cataloger’s productivity. in addition to efficient access to remote and local electronic resources, khurshid includes record transfer through a network and sophisticated record editing as important functions of a cataloger’s workstation. furthermore, khurshid says it is possible to improve cataloging efficiency in the windows-based cataloger’s workstation by finding bibliographic records in other institutions and cutting and pasting lengthy parts of the record (such as summary notes) to their own catalog.14 existing approaches in developing user interfaces for editing marc records the basic source for this analysis of existing user interfaces for editing marc records was the official site for marc standards of the library of congress in addition to scientific journals and conferences. the analysis of existing systems shows that there are two basic approaches in the implementation of editing marc records: 15 • entering bibliographic data in classic screen forms made of text fields and labels, which does not require knowledge of the marc format (concourse,16 koha,17 j-marc18) • direct editing of a marc record shown on the screen (marcedit,19 isismarc,20 catalis,21 polaris,22 marcmaker and marcbraker,23 exlibris voyager24). both of these approaches have advantages and disadvantages. the drawback of the first approach is that it provides a limited set of bibliographic data to edit, and the extension of that set implies changes to the application, or in the best cases changes in configuration. another problem is that there are usually a lot of text fields, text areas, combo boxes, and labels on the screen that need to be organized into several tabs or additional windows. this situation usually makes it difficult for the users to see errors or to connect different parts of the record when checking their work. moreover, all found solutions from the first group perform little validation of data entered by the user.25 one important advantage of the first approach is that the application can be used by a user information technology and libraries | september 2012 69 who is not familiar with the standard, thus the need for access to marc data can be avoided (one of functions listed “marc 21” above). as for second approach, editing a marc record directly on the screen overcomes the problem of extending the set of bibliographic data to enter. it also enables users to scan entered data and check the whole record, which appears on the screen. users can also copy and paste parts of records from other resources into the editor. however, a majority of those applications are actually editors for editing marc files that are later uploaded in some database or transformed in some other format (marcedit, marcmaker and marcbreaker, polaris), and they usually support little or no data validation.26 they allow users to write anything (i.e., the record structure is not controlled by the program), and only validate at the end of the process when uploading or transforming the record. among those editors there are those, such as catalis and isismarc, that present the marc record as a table. they support the control of structure, but the record presented in this way is usually too big to fit on the screen, so it is separated into several tabs. an important function of editing marc records is selecting code for coded elements that can be positioned in the leader or control field, value of the indicator, or value of the subfield. there are also field tags or subfield codes that sometimes need to be selected for addition to a record. all analyzed editors provide additional dialogs for picking this code that require the user to constantly open and close dialogs, which sometimes can be annoying for the user. one important fact about editors in the second group is that they can be used only by a user who is familiar with marc, so access to the large set of marc element descriptions can make the job easier. some of the mentioned systems provide descriptions of the fields and subfields (e.g., isismarc), but most of them do not. findings the editor for marc records was developed as a plug-in for eclipse; therefore it is similar to eclipse’s java code editors. as the editor is written in java, it is platform-independent. the main part of this editor was created using oaw xtext framework for developing textual domain-specific languages.27 it was created using model-driven software development by specifying the model of marc record in a form of xtext grammar and generating the editor. all main characteristics of the editor were generated on the basis of the specification of constraints and extensions of the xtext grammar—therefore all changes to the editor can be realized by changing the specification. moreover, this editor can be easily adjusted for any database by using the concept of extension and extension point in the eclipse plug-in. we make this application independent of eclipse by using rich client platform (rcp) technology. this editor is implemented for marc 21 bibliographic and holdings formats. user interface eclipse editor for marc records |surla 70 figure 1 shows the editor opened within eclipse. the main area is marked with “1”—it shows the marc 21 file that is being edited. that file contains one marc 21 bibliographic record. the tags of the fields and subfields codes are highlighted in the editor, which contributes to presentation clarity. the area marked with “2” serves for listing the errors in the record, that is, nonvalid elements entered in the record. the area marked with “3” shows data about marc 21 in a tree form. this part of the screen has two other possible views: a marc 21 holdings format tree and a navigator, which is the standard eclipse view for browsing resources for the opened project. the actions available for creating a record are available in the cataloging menu and on the cataloging toolbar, which is marked with “4.” these are actions for previewing the catalog card, creating a new bibliographic record, loading a record from a database (importing the record), uploading a record to a database (exporting the record), and creating a holdings record for this bibliographic record. figure 1. eclipse editor for marc records in the eclipse editor for marc, selecting codes is enabled without opening additional dialogs or windows (figure 2). that is a standard eclipse mechanism for code completion: typing ctrl + space opens the dropdown list with all possible values for the cursor’s current position. information technology and libraries | september 2012 71 figure 2. selecting codes record validation is done in real time, and every violation is shown while editing (figure 3). figure 3 depicts two errors in the record: one is a wrong value in the second character position in control field 008, and another is that two 100 fields were entered, which is a field that cannot be duplicated in a record. figure 3. validation errors rcp application of the cataloging editor as shown above, the editor is available as an eclipse plug-in, which raises the question of what a cataloger will do with all the other functions of the eclipse integrated development environment (ide). as seen in figures 1 and 3, there are a lot of additional toolbars and menus that not related eclipse editor for marc records |surla 72 to cataloging. the answer lies in rcp technology. rcp technology generates independent software applications on the basis of a set of eclipse plug-ins.28 the main window of an rcp application with additional actions is shown in figure 4. beside the cataloguing menu that is shown, the window also contains the file menu, which includes save and save as actions, as well as the edit menu, which includes undo and redu actions. all of these actions are also available via the toolbar. figure 4. rcp application conclusion the goal of this paper was to review current user interfaces for editing marc records. we presented two basic approaches in this field and analyzed of advantages and disadvantages of each. we then presented the eclipse marc editor, which is part of the bisis library software system. the idea behind eclipse is inputting structured marc data in the form similar to programming language editors. the author did not find this approach in the accessible literature. the rcp application of the presented editor will find its practical application in future versions of the bisis system. it represents an upgrade of the existing editor and a starting point for forming the version of the bisis system that will be based on marc 21. the acquired results can also be information technology and libraries | september 2012 73 used for the input of other data into the bisis system, including data from the cris system used at the university of novi sad. this paper shows that eclipse plug-in technology can be used for creating end user applications. the development of applications with the plug-in technology enables the use of a big library of created components from the eclipse user interface, whereby writing source code is avoided. additionally, the plug-in technology enables the development of extendible applications by using the concept of the extension point. in this way, we can create software components that can be used by a great number of different information systems. by using the concept of “extension point,” the editor can be extended by the functions that are specific for a data store. an extension point was created for export and import of marc records, which means the marc editor plug-in can be used with any database management system by extending this extension point in eclipse plug-in technology. future work in the development of the eclipse marc editor is to implement support for additional marc formats, for authority and classification data, and for community information. these formats propose the same record structure but have different constraints on the content and different sets of fields and subfields, as well as different codes for character positions and subfields. therefore the appearance of the editor will remain the same. the only difference will be the specification of the constraints and codes for code completion. another interesting topic for discussion is considering implementation of other modules of library information systems in eclipse plug-in technology. references 1. bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloging,” electronic library 27 (2009): 509–28; bojana dimić, branko milosavljević, and dušan surla, “xml schema for unimarc and marc 21 formats,” electronic library 28 (2010): 245–62. 2. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 3. dimić and surla, “xml editor,” dimić, milosavljević, and surla, “xml schema.” 4. danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” electronic library 27 (2009): 162–68; branko milosavljevic and danijela tešendić, “software architecture of distributed client/server library circulation,” electronic library, 28 (2010): 286–99; danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard,” electronic library 27 (2009): 474–95. 5. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” electronic library 28 (2010): 525–36. http://www.loc.gov/marc eclipse editor for marc records |surla 74 6. jelana rađenović, branko milosavljеvić, and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43 (2009): 63–76. 7. katarina belić and dušan surla, “model of user friendly system for library cataloging,” comsis 5 (2008): 61–85; katarina belić and dušan surla, “user-friendly web application for bibliographic material processing,” electronic library 26 (2008): 400–410; eurocris homepage, www.eurocris.org (accessed february 21, 2011). 8. dragan ivanović, dušan surla, and zora konjović, “cerif compatible data model based on marc 21 format,” electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945. 9. eurocris, “common european research information format,” http://www.eurocris.org/index.php?page=cerifreleasesandt=1 (accessed february 21, 2011); dragan ivanović et al., “a cerif-compatible research management system based on the marc 21 format,” program: electronic library and information systems 44 (2010): 229–51. 10. gordana milosavljević et al., “automated construction of the user interface for a cerifcompliant research management system,” the electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1954429; dragan ivanović, dušan surla, and miloš racković, “a cerif data model extension for evaluation and quantitative expression of scientific research results,” scientometrics 86 (2010): 155–72. 11. gordana rudić and dušan surla, “conversion of bibliographic records to marc 21 format,” electronic library 27 (2009): 950–67. 12. holley r. lange, “catalogers and workstations: a retrospective and future view,” cataloging & classification quarterly 16 (1993): 39–52. 13. sarah yoder leroya and suzanne leffard thomas, “impact of web access on cataloging,” cataloging & classification quarterly 38 (2004): 7–16. 14. zahirrudin khurshid, “the cataloger’s workstation in the electronic library environment,” electronic library 19 (2001): 78–83. 15. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 16. book systems, “concourse software product,” http://www.booksys.com/v2/products/concourse (accessed february 19, 2011). 17. koha library software community homepage, http://koha-community.org (accessed february 19, 2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945 http://www.emeraldinsight.com/journals.htm?articleid=1954429 http://www.loc.gov/marc http://www.booksys.com/v2/products/concourse http://koha-community.org/ information technology and libraries | september 2012 75 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, alberta, canada). 19. terry reese, “marcedit—your complete free marc editing utility,” http://people.oregonstate.edu/~reeset.marcedit/html/index.php (accessed february 19, 2011). 20. united nations educational scientific and cultural organization, “isismarc,” http://portal.unesco.org/ci/en/ev.phpurl_id=11041&url_do=do_topic&url_section=201.html (accessed february 19, 2011). 21. fernando j. gómez “catalis,” http://inmabb.criba.edu.ar/catalis (accessed february 19, 2011). 22. polaris library systems homepage, http://www.gisinfosystems.com (accessed february 19, 2011). 23. library of congress, “marcmaker and marcbreaker user’s manual,” http://www.loc.gov/marc/makrbrkr.html (accessed february 19, 2011). 24. exlibris, “exlibris voyager,” http://www.exlibrisgroup.com/category/voyager (accessed february 19, 2011). 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. 27. eclipse.org, “xtext,” http://www.eclipse.org/xtext (accessed february 19, 2011). 28. the eclipse foundation, “rich client platform,” http://wiki.eclipse.org/index.php/rich_client_platform (accessed february 19, 2011). http://people.oregonstate.edu/~reeset.marcedit/html/index.php http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://inmabb.criba.edu.ar/catalis http://www.gisinfosystems.com/ http://www.loc.gov/marc/makrbrkr.html http://www.exlibrisgroup.com/category/voyager http://www.eclipse.org/xtext http://wiki.eclipse.org/index.php/rich_client_platform 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, ... 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. : | zhang et al. 75seeing the wood for the trees | zhang et al. 75 here again, no weighting or differentiating mechanism is included in describing the multiple elements. what is addressed is the “what” problem: what is the work of or about? metadata schemas for images and art works such as vra core and cdwa focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. however, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. this introduces more subject terms into the system. yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. as collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. the following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the dublin core metadata element set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 similarly, the library of congress practice, suggested in the subject headings manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 a topic is only “important enough” to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. a similar practice applies in non-textual object subject indexing. because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by layne in “ofness” and “aboutness” in each level. specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographical levels.4 in practice, vra core 4.0 for example defines subject subelements as: terms or phrases that describe, identify, or interpret the work or image and what it depicts or expresses. these may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and iconographic themes, or terms that refer to broader concepts or interpretations.5 seeing the wood for the trees: enhancing metadata subject elements with weights subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. with more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. we argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas. s ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. this approach to indexing is implicitly assumed in various guidelines for subject indexing. for hong zhang, linda c. smith, michael twidale, and fang huang gaocommunications hong zhang (hzhang1@illinois.edu) is phd candidate, graduate school of library and information science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, and fang huang gao (fgao@gpo.gov) is supervisory librarian, government printing office. 76 information technology and libraries | june 2011 ■■ examples of problems exhaustive indexing: digital library collections a search query of “tree” can return thousands of images in several digital library collections. the results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor components of the image. figure 1 illustrates the point. these examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. there is no mechanism that catalogers or users have available to indicate that “tree” in these images is a minor component. note that we are not calling this out as an error in the professionally developed subject terms, nor indeed in the end user generated tags. although particular images may have an incorrectly applied keyword, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. furthermore, such keywords referring to minor components of the image are extremely useful for other queries. this kind of exhaustive indexing of images enables the effective satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” with large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. to enable them, metadata about minor subjects is essential. however, without weights to differentiate subject keywords, users will get overwhelmed with partially relevant results. for example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. for some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and sometimes the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. that is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. however, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the volume of items grows. lack of weighting also limits other potential uses of keywords or tags. for example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” overall.6 unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. extending our example, the tag “tree” may occur so frequently and be so prominent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsh in library catalogs although more extreme in the case of images in conveying the “ofness,” the same problem with multiple subjects also applies to text in terms of “aboutness.” the following example comes from an online library catalog in a faceted navigation web interface using library of congress subject headings in subject cataloging.7 the query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the topic facet. according to the subject headings manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 this means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. with the provided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject heading may or may not be primarily about this topic. there is no way to indicate which it is in the metadata, nor in the search interface. as this example shows, the library of congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. however in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary subjects. consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems the negative effect of current subject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. in a study examining “the contribution of metadata to effective searching,”9 hawking and zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the problems in data quality and system implementation. the same problem : | zhang et al. 77seeing the wood for the trees | zhang et al. 77 authors compared with the automatic indexing systems, because human indexers should be better at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base significance on term frequency.13 indeed, while various weighting algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 a similar problem is reported in a recent study by lykke and eslau. in comparing searching by controlled subject metadata, searching based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest precision among the three strategies. the problem of indiscriminate metadata indexing is “remarkable” to the of multiple tags without weights is described: in the kinds of queries we have studied, there is typically one page (or at most a small number) that is particularly valuable. there are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes a. subject: women; books; dresses; flowers; trees; . . . in: victoria & albert museum (accessed aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream b. tags: japanese; moon; nights; walking; tree; . . . in: brooklyn museum (accessed aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/aoi_slope_outside_toranomon_gate_no._113_from_ one_hundred_famous_views_of_edo c. tags: japanese; birds; silk; waterfall; tree; . . . in: steve: the museum social tagging project (accessed aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 figure 1. example images with “tree” as a subject item 78 information technology and libraries | june 2011 anderson in niso tr021997.20 in addition, researchers have noticed the limitations of this dichotomous indexing. in an opinion piece, markey emphasizes the urgency to “replace boolean-based catalogs with post-boolean probabilistic retrieval methods,”21 especially given the challenges library systems are faced with today. it is the time to change the boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tagging, or by an automatic mechanism. indeed, as declared by svenonius, “while the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 needed refinements in subject indexing the fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. this might explain why the weighting practice is applied in the above mentioned medline/pubmed system. with web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. in reviewing information users and use from the 1920s to the present, miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and technical environment has been.23 in recognizing this theme in the future development of information organization and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analysis for some time. weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by foskett: whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increasing recall, but at the expense of relevance. a device which we may use to counteract this effect to some extent is weighting. in this, we try to show the significance of any particular specification by giving it a weight on a pre-established scale. for example, if we had a book on pets which dealt largely with dogs, we might give pets a weight of 10/10, and dogs, a weight of 8/10 or less.16 anderson also includes weighting as a part of indexing in the guidelines for indexes and related information retrieval devices (niso tr021997): one function of an index is to discriminate between major and minor treatments of particular topics or manifestations of particular features.17 he also notes that a weighting scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. similarly, fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 metadata indexing without weighting is related to the simplified dichotomous assumption in subject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. weighting: yesterday, today, and future precedent weighting practices written more than thirty years ago, the final report of the subject access project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and backof-the-book indexes. the criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 a similar mechanism was adopted in the eric database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. while some search systems allowed differentiation of major and minor descriptors in formulating searches, others simply included the distinction (with an asterisk) when displaying a record. unfortunately, this distinguishing mechanism is no longer included in the later eric indexing data. a system using weighted indexing and searching and still running today is the medline/pubmed interface. a qualifier [majr] can be used with a medical subject headings (mesh) term in a query to “search a mesh heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 in the search result page, each major mesh topic term is denoted by an asterisk at the end. weighting concept and the purpose of indexing the weighting concept is connected with the fundamental purpose of indexing. the idea of weighting in : | zhang et al. 79seeing the wood for the trees | zhang et al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. references 1. “dublin core metadata element set, version 1.1,” http://dublincore.org/docu ments/dces/ (accessed nov. 20, 2010). 2. library of congress, subject headings manual (washington, d.c.: library of congress, 2008). 3. ibid. 4. elaine svenonius, “access to nonbook materials: the limits of subject indexing for visual and aural languages,” journal of the american society for information science, 45, no. 8 (1994): 600–606. 5. “vra core 4.0 element description,” http://www.loc.gov/standards/vracore/ vra_core4_element_description.pdf (accessed mar. 31, 2011). 6. richard j. urban, michael b. twidale, and piotr adamczyk, “designing and developing a collections dashboard,” in j. trant and d. bearman (eds). museums and the web 2010: proceedings, ed. j. trant and d. bearman (toronto: archives & museum informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed apr. 5, 2011). 7. “vufind at the university of illinois,” http://vufind.carli.illinois.edu (accessed nov. 20, 2010). 8. library of congress, subject headings manual. 9. david hawking and justin zobel, “does topic metadata help with web search?” journal of the american society for information science & technology 58, no. 5 (2007): 613–28. 10. ibid. 11. ibid. 12. ibid, 625. 13. marianne lykke and anna g. eslau, “using thesauri in enterprise settings: indexing or query expansion?” in the janus faced scholar. a festschrift in honour of peter ingwersen, ed. birger larsen et al. (copenhagen: royal school of library & information science, 2010): 87–97. 14. subject access project, books are for use: final report of the subject access project to the council on library resources (syracuse, n.y.: syracuse univ., 1978). 15. “pubmed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. more sophisticated scales certainly enable more useful ranking of results, but the cost of obtaining such information may rise. after the mechanism of incorporating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. weights in both indexing and retrieval system adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extending metadata schemas by encoding weights in subject elements; (2) subject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting information in subject metadata elements. the mechanism will not work effectively in the absence of any one of them. conclusion this paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. as social tagging is brought into today’s digital library collections and online library catalogs, as collections grow and are aggregated, and the opportunity arises for adding more metadata from a variety of different sources, including end should provide sufficient granularity to allow more granular access to information, as demonstrated in the examples in the previous section. potential challenges while arguing for the potential value of weights associated with subject terms, it is also important to acknowledge potential challenges posed by this approach. human judgment treating assigned terms equally might seem to avoid the additional human judgment and the subjectivity of the weight levels because different catalogers may give different weight to a subject heading. we argue that assigning subject headings is itself unavoidably subjective. we are already using professional indexers and subject catalogers to create value-added metadata in the form of subject terms. assigning weights would be a further enhancement. on the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. no matter who will do the subject indexing or tagging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. the weighting scale in terms of the specific mechanism of representing the weight rating, we can benefit from research on weighting of index terms and on the relevance of search results. for example, the three categories of relevant, partially relevant, and nonrelevant in information retrieval are similar to the major, minor, and nonpresent subject indexing method in the examples above. borlund notes several retrieval studies proposing 80 information technology and libraries | june 2011 22. svenonius, “access to nonbook materials,” 601. 23. francis miksa, “information organization and the mysterious information user,” libraries & the cultural record 44, no. 3 (2009): 343–70. 24. pia borlund, “the concept of relevance in ir,” journal of the american society for information science & technology 54, no. 10 (2003): 913–25. 18. ibid. 19. raya fidel, “user-centered indexing,” journal of the american society for information science 45, no. 8 (1994): 572–75. 20. anderson, guidelines for indexes and related information retrieval devices, 20. 21. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed nov. 20, 2010). 16. a. c. foskett, the subject approach to information, 5th ed. (london: library association publishing, 1996): 24. 17. james d. anderson, guidelines for indexes and related information retrieval devices. niso-tr02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed nov. 20, 2010): 25. 44 information technology and libraries | march 2011 jennifer emanuel usability of the vufind next-generation online catalog vufind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commercial search engines. the vufind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or isbn/issn (see figure 1). to combine searches using boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). the recordresults page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. the results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). vufind also contains a variety of web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to google book previews and extensive author biographies data mined from the internet. corresponding to the beginning of the vufind trial at uiuc, the university library purchased reviews, synopses, and cover images from syndetic solutions to further enhance both vufind and the existing webvoyage catalog. an additional appealing aspect of vufind was its speed; the carli installation of webvoyage is slow to load and is prone to time out while conducting searches. the uiuc library first provided vufind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. use statistics show that throughout the fall semester (september through december), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. librarians at both uiuc and carli were interested in what users thought about vufind, especially in relation to the usability of the interface. with this in mind, the library launched several forms of assessment during the spring semester. the first was a quantitative survey based on yale’s vufind usability testing.3 the second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. this article will discuss the hands-on usability portion of this study. survey responses that support the results presented herein will be reported in a separate venue. while this article only discusses vufind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. the vufind open–source, next-generation catalog system was implemented by the consortium of academic and research libraries in illinois as an alternative to the webvoyage opac system. the university of illinois at urbana-champaign began offering vufind alongside webvoyage in 2009 as an experiment in next generation catalogs. using a faceted search discovery interface, it offered numerous improvements to the uiuc catalog and focused on limiting results after searching rather than limiting searches up front. library users have praised vufind for its web 2.0 feel and features. however, there are issues, particularly with catalog data. v ufind is an open–source, next-generation catalog overlay system developed by villanova university library that was released to the public as beta in 2007 and version 1.0 in 2008.1 as of july 2009, four institutions implemented vufind as a primary catalog interface, and many more are either beta or internally testing it.2 more information about vufind, including the technical requirements and compatible opacs, is available on the project website (http://www.vufind.org). in illinois, the state consortium of academic and research libraries in illinois (carli) released a beta installation of vufind in 2008 on top of its webvoyage catalog database. the carli installation of vufind is a base installation with minor customizations to the carli catalog environment. some libraries in illinois utilize vufind as an alternative to their online catalog, including the university of illinois at urbana-champaign (uiuc), which currently advertises vufind as a more user friendly and faster version of the library catalog. as a part of the evaluation of nextgeneration catalog systems, uiuc decided to conduct hands-on usability testing during the spring of 2009. the carli catalog environment is very complex and comprises 153 member libraries throughout illinois, ranging from tiny academic libraries to the very large uiuc library. currently, 76 libraries use a centrally managed webvoyage system referred to as i-share. i-share is composed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. carli’s vufind installation uses the records of the entire union catalog, but has library-specific views. each of these views is unique to the member library, but each library uses the same interface to view records throughout i-share. jennifer emanuel (emanuelj@illinois.edu) is digital services and reference librarian, university of illinois at urbana-champaign. usability of the vufind next-generation online catalog | emanuel 45 not simply find them.6 as a result, the past five years have been filled with commercial opac providers releasing next-generation library interfaces that overlay existing library catalog information and require an up-front investment by libraries to improve search capabilities. as these systems are inherently commercial and require a significant investment of capital, several open–source, next-generation catalog projects have emerged, such as vufind, blacklight, scriblio, and the extensible catalog project.7 these interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. however, because they can be locally customized, libraries with significant technical expertise can have a unique interface that commercial vendors cannot compete against. one cannot discuss next-generation catalogs without mentioning the metadata that underlie opac systems. some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. many librarians view traditional cataloging using machine-readable cataloging (marc), which has been used since the 1960s, as outdated because it was developed with nearly fifty-year-old technology in mind.8 however, because marc is so common and allows cataloging with a fine degree of granularity, current opac systems still utilize it. librarians have developed additional cataloging standards, such as dublin core (dc), metadata object description schema (mods), and functional requirements for bibliographic records (frbr), but none of these have achieved widespread adoption for cataloging printed materials. newly developed catalog projects, such as extensible catalog, are beginning to integrate these new metadata schemas, but currently others continue to use marc.9 many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. folksonomy is used by many library websites, most notably flickr, delicious, and librarything, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 vufind integrates tagging into individual item records ■■ literature review librarians have complained about the usability of online catalogs since they were first created.4 when amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both opac interfaces and metadata standards.5 ever since north carolina state university announced a partnership with the commercial-search corporation endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, figure 1. vufind default search figure 2. vufind advanced search figure 3. facets in vufind 46 information technology and libraries | march 2011 searching the library’s online catalog and were eager to see changes made to it. the test used was developed from a statewide usability test of different catalog interfaces usedin illinois. the test was adapted using the same sample searches, but was customized to the features and uses of vufind (see appendix). the vufind test was similar to the original test to allow a comparison of other catalog interfaces to vufind for internal evaluation purposes. i designed the test to allow subjects to perform a progressively complicated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using vufind. the tasks were common library-catalog tasks using topics familiar at undergraduate–level students. the tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist prince. the tasks also included using the features associated with creating and using an account with vufind, such as adding tags and creating a favorite items list. through completing the test, subjects got an overview of vufind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. the tests were performed in a small meeting room with one workstation set up with an install of the morae software, a microphone, and a web camera. morae is a very powerful software program developed by techsmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. although the study did not utilize all the features of the morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. the study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while morae recorded all of their actions. once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject categories and individual question answers. the researcher then gathered the codes into categories and used them to further analyze and gain insight into both the useful features of and problems with the vufind interface. ■■ analysis participants generally liked vufind and preferred it to the current webvoyage system. when asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use webvoyage. this faculty but does not pull tags from other sources; rather, users must tag items individually. additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. users, accustomed to new ways of searching both on the internet and through commercial library indexing and abstracting databases, now search in a fundamentally different style than they did when opacs first became a part of library services. the online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. library users are now accustomed to using a single search box, such as with google; they also use nonlibrary online tools to find information about books and no longer view library catalogs as the primary place to look for books.11 as users are no longer accustomed to using the controlled language and particular searching methods of library catalogs because they have moved to discovering materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 vufind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ methods the study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. i recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the undergraduate library. all means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog interface. i also informed subjects that there was a gift card as a thank you for their time. all subjects had to sign a human subjects statement of informed consent approved by the university of illinois institutional review board. i sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. i felt that these three user groups were distinct enough to warrant having separate pools. the number of five users in each group was chosen because of jakob nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 although i did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. all subjects stated they had some experience usability of the vufind next-generation online catalog | emanuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. participants were then asked to look at the right sidebar of the results page, which contains the facets. most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. one faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” however, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “refine search,” “narrow search,” or “sort your search.” participants were then asked to find how to see results for other i-share libraries. only two faculty members found i-share results quickly, and just half of the remaining participants were able to find the option at all. when asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. two users thought having the location integrated as a facet would be the most useful way of seeing it. participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. no user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. the next task for participants was to open and examine a single record within their original climate change results (see figures 4 and 5). participants liked the layout, including the cover image with some brief title information, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. several users remarked that they liked having information contained under tabs, but vufind organized each tab as a new webpage that made going back to previous tabs or the results page cumbersome. the only problem users had with the information contained within the tabs was the “staff view,” which contained the marc record information. most users looked at the marc record with confusion, including one graduate student who said, “if the staff view is of no use to the user, why even have it there?” one other useful feature that individual records in vufind contain is a link to an overlay window containing the full citation information for the item in both apa and mla formats. users were able to find this “cite this” link and liked having that information available. however, several participants noted that citation information would be much more beneficial if it could be easily exported to refworks or other bibliographic software. the next several searches used progressively higher-level member thought most of his searches were too advanced for the vufind interface and needed options that vufind did not have, such as limiting a search to an individual library or call number searching. this user did, however, specify that vufind would be easier to use for a fast and simple search. other users all responded very favorably to vufind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. the most common responses to vufind were that the layout is easier on the eyes and displayed data much better than the webvoyage catalog; there were no comments about actual search results. several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after searching and letting them browse through a large number of search results. one user, an undergraduate student, stated she liked vufind because it “was new” and she always wants to try out new things on the internet. the first section of the usability test asked users to examine both the basic and advanced search options. users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. they also recognized all of the dropdown menu options and agreed that the options included what they most often searched. however, four users wanted a keyword search. even though there is not a keyword search in webvoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. one participant, an international graduate student, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. in the advanced search, all users thought the search options were clear and liked having icons to depict the various media formats. however, two users did remark that it would be useful to be able to limit by year on the advanced search page. the advanced search also is where the user can select one of seven languages, all of which are considered western languages, including latin and russian. two users, both international graduate students, stated that more languages would be beneficial, especially asian and more slavic languages. the university of illinois has separate libraries for asian and slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. the first task that participants were asked to do was an “all fields” search for “climate change.” they were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. upon looking at the results, all participants thought they were relevant, 48 information technology and libraries | march 2011 to items in which james joyce is the author, no participant had any problems, though several pointed out that there were three facets using his name—joyce, james; joyce, james avery; and joyce, j. a.—because of inconsistencies in cataloging (see figure 6). participants were next asked to search for an audio recording by the artist prince using the basic (single) search box. most participants did an “all fields” search for prince and attempted to use the facets to limit by a particular format. all but one was confident that they achieved the proper result, but there was confusion about the format. some participants were confused as to what format an audio recording was because the corresponding facet was for a music recording. a couple of users thought “audio recording” could be a spoken-word recording. most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” a more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. it appears vufind pulls the format information from marc field 245 subfield $h for medium rather than the call number (which at illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. however, when participants were asked to further use facets to find prince’s first album, 1978’s for you, limitations with vufind became more apparent. each participant used a different method to search for this album, and none actually found the item either locally or in i-share, though the item has multiple copies available in both locations. most participants tried initially limiting by date because they were given that information. however, vufind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both vufind and the catalog record data. the first search asked participants to do an “all fields” search for james joyce. all were able to complete the search, but there was notable confusion as to which records were written by james joyce and which were items about him. about half of the first-page results for this search did not list an author on the results page. vufind appears to pull the author field on the results page from the 100 field in the marc record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. additionally, when asked to use the facets to limit figure 4. results set figure 5. record display figure 6. author facet figure 7. format facet usability of the vufind next-generation online catalog | emanuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. many of the participants wanted more information as to where the reviews came from because this information was not clear. they also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. for the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. several users, all graduate students, expressed concern about the objectiveness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library catalog record. one of these participants stated, “if i wanted reviews, i would just go to amazon. i don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. the final task of the usability test asked participants to create an account with vufind because it is not connected to our user database. most users had no problems finishing this task, though they found some problems with the interface. first, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s opac. second, the default field asks users for their barcode, which is not a term used at uiuc (users are assigned a library number). once logged in, participants were satisfied with the menu options and how their account information was displayed. finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. all users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ discussion participants thought favorably of the vufind interface and would use it again. they liked the layout of information much more than the current webvoyage interface and thought it was much easier to look at. they also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library opac. vufind also had more visual elements, such as cover images and icons representing format types that participants also commented on favorably. when asked to compare vufind to both the webvoyage catalog and amazon, only one participant indicated a preference for amazon, while the rest preferred vufind. the user who specified amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. it appears that vufind pulls the era facet information from the subject headings and ignores the copyright or issue year. to users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. another search that further highlighted problems searching for multimedia in vufind is the title search participants did for gone with the wind. everyone thought this search brought up relevant results, but when asked to determine whether the uiuc library had a copy of the dvd, many users expressed confusion. once again, the confusion was based on the inability to limit to a specific format. participants could use the facets to limit to a film or video, but not to a specific format. several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find dvds. however, because all film formats are linked together under “film/video,” they must to go into individual records and examine the call number to determine the exact format. most participants stated clearly that “dvd” needed to be it’s own format facet and that entering a record to find the format required too much effort. participants also expressed frustration that the call number was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. the frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a dvd on public speaking. all users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. however, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. another search that participants conducted was an author search for jack london. they then used the facets to find the book white fang. this search was chosen because the resulting records are mostly for older materials that often do not contain a lot of the additional information that newer records contain. participants looked at a specific record and then were asked what they thought of the information that was displayed. most answered that they would like as much information as you can give them, but were accepting of missing information. several participants stated that most people already know this book and thus did not need additional information. however, when pressed as to what information they would like added to the record, several users stated a summary would be the most useful. additionally, several users asked for more information 50 information technology and libraries | march 2011 the simplicity of the favorites listing feature, the difficulty of linking to other i-share library holdings, and the difficulties in using the facet categories. ■■ implications i intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at uiuc. uiuc is looking at various catalog interfaces, of which vufind is one option, to see which best meets the needs of our users. users stated multiple times during testing that they find the current webvoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usability issues. vufind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. the future of vufind at uiuc is still open. development is currently stalled because of a lack of developer updates and internal staffing constraints both at uiuc and carli. however, because vufind is open–source, and the only ongoing cost is that of server maintenance, both carli and the library are continuing to display it as an option for searching the catalog. both carli and uiuc are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either vufind or to demo other options. despite its limitations, vufind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. although it does have limitations, it has a better out-of-the-box interface than traditional opacs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. this usability test focused on one institution’s installation of vufind, which may or may not apply to other installations and other institutional needs. it would be interesting to study an installation of vufind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s opac. references 1. john houser, “the vufind implementation at villanova university,” library hi tech 27, no. 1 (2009): 96–105. 2. vufind, “vufind: about,” http://www.vufind.org/about .php (accessed sept. 10 2009). 3. kathleen bauer, “yale university vufind test— undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed mar. 20, 2010). library catalog to check availability. other participants who made comments about amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. several participants also stated they checked amazon for book information, but generally did not like it because of its commercial nature; because vufind provides much of the same information as amazon, they will use vufind first in the future. participants also thought amazon was for a popular and not scholarly audience, making it not useful for academic purposes. most users did not have much to say about the webvoyage opac, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. participants were also asked to look at vufind, amazon, and webvoyage from a visual preference. again, participants believed that vufind had the best layout. they liked that vufind had a very clean and uncluttered interface and that the colors were few and easy on the eye. they also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of vufind (webvoyage has a horizontal orientation) to display records. they also liked how the facets were displayed, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. the one thing that was mentioned several times was vufind’s lack of the star rating system that amazon uses to quickly rate an item. participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. when asked to rate the ease of use for vufind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. faculty rated the ease at 1.6, graduate students at 1.75, and undergraduates at 2.8. undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. however, when asked if they would rather use vufind over the current library catalog (webvoyage), all but one participant enthusiastically stated they would use vufind. most users stated that although vufind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. the only user that specified they would still rather use the webvoyage catalog believed it had more options for advanced search, such as call number searching, which vufind lacked. there are, however, several changes that could make vufind more useful to our users that came out of usability testing. some of these are easy to implement on a local level, and others would improve the base build of vufind. a number of issues arose from usability testing, but the largest issues are the lack of refworks integration, usability of the vufind next-generation online catalog | emanuel 51 9. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19. 10. tom steele, “the new cooperative cataloging,” library hi tech 27, no. 1 (2009): 68–77. 11. ian rowlands and david nicholas, “understanding information behaviour: how do students and faculty find books?” journal of academic librarianship 34, no. 1 (2008): 3–15. 12. ja mi and cathy weng, “revitalizing the library opac: interface, searching, and display challengers,” information technology & libraries 27, no. 1 (2008): 5–22. 13. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed mar. 20, 2010). 4. christine borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996): 493–503. 5. georgia briscoe, karne selden, and cheryl rae nyberg, “the catalog versus the home page: best practices for connecting to online resources,” law library journal 95, no. 2 (2003): 151–74. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 7. marshall breeding, “library technology guides: discovery layer interfaces,” http://www.librarytechnology. org/discovery.pl?sid=20100322930450439 (accessed mar. 2010). 8. karen m. spicher, “the development of the marc format,” cataloging & classification quaterly 21, no 3/4 (1996): 75–90. appendix. vufind usability study logging sheets i. the look and feel of vufind a. basic screen (the vufind main page) 1) is it obvious what to do? yes _____ no _____; what were you trying to do? 2) open the drop down box, examine the options. do you recognize theseoptions? yes _____ no _____ some _____ (if some, find out what the patron was expecting and get suggestions for improvement). comments: b. click on the advanced search option—take a minute to allow the participants to look around the screen 1) examine each of the advanced search options a) are the advanced search options clear? yes_____ no_____ b) are the advance search options helpful? yes_____no_____ 2) examine the limits fields, open the drop-down menu boxes a) are the limits clearly identified? yes _____ no _____ b) are the pictures helpful? yes _____ no _____ c) are the drop-down menu box options clear? yes _____ no _____ comments: ii. (back to the) basic search field a. enter the phrase—climate change (search all fields)—examine the search results 1) do the records retrieved appear to be relevant to your search statement? yes _____no _____don’t know _____ 2) what information would you like to see in the record? how should it be displayed? 3) examine the right sidebar. are the “facets” clear? yes _____no _____some, not all _____ 4) if you want to view items from other libraries in your search results, can you find the option? yes _____no _____ 5) can you find an electronic book published in 2008? yes _____no _____don’t know _____ comments: b. click on the first book record in the original climate change search results 1) is information about the book clearly represented? yes _____ no _____ 2) is it clear where to find item? yes _____ no _____ 3) look at the tags. do you understand what this feature is? yes _____ no _____ comments: c. look at the brief item information provided on the screen 1) is the information displayed useful in determining the scope and content of the item? yes _____no _____ 2) are the topics in the record useful for finding additional information on the topic? yes _____no _____ comments: d. click on each button below the brief record information 1) is this information useful? yes _____ no _____ 2) are the names for the tabs accurate? what should they be named? e. can you easily determine where the item is located and how to request it? yes _____no _____ comments: f. go back to the basic search box and enter the author james joyce (all fields) as a new search 1) is it easy to distinguish items by james joyce from items about james joyce? yes _____no _____ 2) using the facets, can you find only titles with james joyce as author? yes _____no _____ 3) can you find out how to cite an item? yes _____ no _____ comments: 52 information technology and libraries | march 2011 g. now try to find an audio recording by the artist prince using basic search were you successful? yes _____no _____ h. find the earliest prince recording ( “for you”; 1978). is it in the local collection? yes _____ no _____ if not, can you get a copy? comments: iii. in the advanced search screen: a. use the title drop down to find the item: gone with the wind 1) were you successful? yes _____ no _____ not sure _____ 2) can you locate a dvd of the same title? yes _____ no _____ 3) are copies of the dvd available in the university of illinois library? yes _____ no _____ comments: b. use the author drop down in the advanced search to locate titles by: jack london using the facets, find and open the record for the jack london novel, white fang. explore each of the: description, holdings, and comments tabs: 1) is this information useful? yes _____ no _____ 2) would you change the names of the tabs or the information on them? 3) other than your local library copy of white fang, can you find copies at other libraries? yes _____ no _____ comments: c. using the advanced search, find a dvd on public speaking (hint: use the limit box to select the film/video format) are there instructional videos in the university of illinois library? yes _____ no _____ 1) identify the author that’s responsible for one of the dvds 2) can you easily find other works by this author? yes _____ no _____ comments: iv. exploring the account features: a. click on login in the upper right corner of the page. on the next page, create an account. is it clear how to create an account? yes _____ no _____ b. once you have your account and are logged in to vufind, look at the menu on the right hand side. is it clear what each of the menu items are? yes _____ no _____ c. while still logged in, do a search for your favorite book and add it to your favorites list. is this tool useful, would you consider using it? yes _____ no _____ comments: v. comparing vufind to other resources: a. open three browser windows (this is easiest in firefox by entering ctrl-t for each new window) with 1) your library catalog 2) vufind 3) amazon.com enter global warming in each website in the basic search window of each. based on your initial reactions, which service appears the best for most of your uses? library catalog _____ vufind _____ amazon _____ comments: c. do you have a preference in the display formats? library catalog _____ vufind _____ amazon _____ comments: debriefing now that you have used vufind, how would you rate it—on a scale from 1–5, from easy to confusing to use? comments? how does it compare to other library catalogs you’ve used? if vufind and your home library catalog were available side-by-side, which would you use first? why? are you familiar with any of these other products: aquabrowser _____ googlebooks _____ microsoft live search _____ librarything _____amazon.com _____other preferred service _____ that’s it! thank you for participating in our usability. you will be receiving one other survey through email, we appreciate your opinions on the vufind product. lita covers 2, 3, and 4 index to advertisers librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 129 debra a. riley-huff and julia m. rholes librarians and technology skill acquisition: issues and perspectives qualified individuals to fill these technology-driven librarian roles in our libraries and if so why? how are qualifications acquired and what are are they, besides a moving target? there appears to be two major convergent trends influencing this uncertain phenomenon. the first is what is perceived as “lack of awareness” and consensus about what the core of lis needs to be or to become in order to offer real value in a constantly changing and competitive information landscape.5 the other trend centers on the role of lis education and the continuing questions regarding its direction, efficacy, and ability to prepare future librarians for the modern information professions of now and the future. while changes are apparent it appears many lis programs are still operating on a two-track model of “traditional librarians and information managers” and there are enough questions in this area to warrant further investigation and inquiry.6 ■■ literature review most of the literature pertaining to the readiness of librarians to work in increasingly technical environments, centers on lis education. this certainly makes sense given the assumed qualifications the degree confers. scant literature focuses solely on the core of the librarians’ professional identity, workplace culture, and institutional historical perspectives related to qualifications; however, allusions to “redefining” lis are often found in lis education literature. there is limited research on preprofessional or even professional in-service training although calls for such research have been made repeatedly. a key study on lis education is the 2000 kaliper report, issued when the impact of technology in libraries was clearly reaching saturation.7 the report is the product of an analysis project with a goal of examining new trends in lis education. the report lists six trends including three of which are pertinent to the investigation of technology inclusion in lis programs. these trends note that in 2000, lis programs were beginning to address a more broad range of information problems and environments, programs were increasing it content into the curriculum, and several programs were beginning to offer specializations within the curriculum, though not ones with a heavy technology focus. in a widely cited curriculum study in 2004, markey completed a comprehensive examination of 55 libraries are increasingly searching for and employing librarians with significant technology skill sets. this article reports on a study conducted to determine how well prepared librarians are for their positions in academic libraries, how they acquired their skillss and how difficult they are to hire and retain. the examination entails a close look at ala-accredited lis program technology course offerings and dovetails a dual survey designed to capture experiences and perspectives from practitioners, both library administrators and librarianss who have significant technology roles. a recent oclc report on research libraries, risk, and systemic change discusses what arl directors perceive as the highest risks to their libraries.1 the administrators reported on several high risks in the area of human resources including high-risk conditions in recruitment, training, and job pools. the oclc report notes that recruitment and retention is difficult due to the competitive environment and the reduction in the pool of qualified candidates. why precisely do administrators perceive that there is a scarcity of qualified candidates? changes in libraries, most of which have been brought on by the digital age, are reflected in the need for a stronger technological type of librarianship—not simply because technology is there to be taken advantage of, but because “information” by nature has found its dominion as the supreme commodity perfectly transported on bits. it follows, if information is your profession, you are no longer on paper. that lis is becoming an increasingly technology-driven profession is both recognized and documented. a noted trend particularly in academic libraries is a move away from simply redefining traditional or existing library roles altogether in favor of new and completely redesigned job profiles.2 this trend verifies actions by library administrators who are increasingly seeking librarians with a wider range of information technology (it) skills to meet the demands of users who are accessing information through technology.3 johnson states the need well as we need an integrated understanding of human needs and their relationships to information systems and social structures. we need unifying principles that illuminate the role of information in both computation and cognition, in both communication and community. we need information professionals who can apply these principles to synthesize human-centered and technological perspectives.4 the questions then become, is there a scarcity of debra a. riley-huff (rileyhuf@olemiss.edu) is web services librarian, university of mississippi libraries, university, miss. julia m. rholes (jrholes@olemiss.edu) is dean of libraries, university of mississippi libraries, university, mississippi. 130 information technology and libraries | september 2011 academic libraries had embarked on an unprecedented increase in filling librarian positions with professionals who do not have a master’s degree in library science.13 citing the association of research libraries annual salary statistics, among a variety of positions being filled by other professionals a substantial number are going to those in technology fields such as systems and instructional technology. in the mid 2000s, suggestions that library schools needed to work more closely with computer science departments began coming up more often. obstacles to these types of partnerships were noted as computer science departments failed to see the advantage offered by library science faculty as well as being wary of taking on a “softening” by the inclusion of what is perceived as a “soft science.”14 in response, most library schools have added courses in computing, but many still question the adequacy. more recently there have been increasing calls from within lis for more research into lis education and professional practice. in 2006, a study by mckinney comparing proposed “ala core competencies” to what was actually being taught in ala-accredited curricula, shed some light on what is currently offered in the core of lis education.15 the study found that the core competency required most often in ala-accredited programs were “knowledge organization” or cataloging (94.6 percent), “professional ethics” (80.4 percent), “knowledge dissemination” or reference (73.2 percent), “knowledge inquiry” or research (66.1 percent), and “technical knowledge” or technology foundations (66.1 percent).16 these courses map well to ala core competencies but the question in the digital age, is one, not even universally required, technology-related course adequate for a career in lis? the literature would seem to reflect that it is not. 2007 saw many calls for studies of lis education using methods that not only examined course curricula but that also sought evidence of outcomes by those working in the field.17 an interest in studies reporting on employers’ views, graduates’ workplace experiences, and if possible longitudinal studies have been outwardly requested.18 indications are that those in library work environments can play a vital role in shaping the future course of lis education and preprofessional training by providing targeted research, data, and evidence of where weaknesses are currently being experienced and what changes are driving new scenarios. the most current literature points out both areas of technology deficiencies and emerging opportunities in libraries. areas with an apparent need for immediate improvement are the continuing integration of third-party web 2.0 application programming interfaces (apis) and social networking platforms.19 debates about job titles and labels continue but the actuality is that the number of adequately trained digital librarians has not kept up with the demand.20 modern libraries require those in technology-related roles to have broad or ala-accredited lis programs looking for change between the years 2000 and 2002.8 markey’s study revealed that while there were improvements in the number of it-related courses offered and required throughout programs, they were still limited overall with the emphasis continuing to be on the core curriculum consisting of foundations, reference, organization, and management. one of the important points markey makes is the considerable challenge involved in retraining or acquiring knowledgeable faculty to teach relevant it courses. the focus on lis education issues came to the fore in 2004 when michael gorman released a pair of articles asserting that there was a crisis in lis education, namely an assault on lis by what gorman referred to as “information science,” “information studies” and “information technology.”9 gorman’s papers sought to establish that there is a de facto competition between information science courses, which he characterized as courses with a computational focus and lis courses, which composed core librarianship courses, those tending to be the more user focused and organizational. gorman claimed lis faculty were being marginalized in favor of information science and made further claims regarding gender roles within the profession along the alleged lis/is split. gorman also noted that there was no consensus about how “librarianship” should be defined coming from either ala or the lis graduate programs. the articles were not without controversy, spurring a flurry of discussion in the library community, which spawned several follow up articles. dillon and norris rallied against the library vs. information science argument as a premise, which has no bearing on the reality of what is happening in lis and does nothing but create yet another distracting disagreement over labels.10 others argued for the increasing inclusion of technology courses in lis education, as estabrook put it, librarianship without a strong linkage to technology (and it’s capacity to extend our work) will become a mastodon. technology without reference to the core library principles of information organization and access is deracinated.11 as the future of lis was being hotly debated, voices in the field were issuing warnings that obstacles were being encountered finding qualified librarians with the requisite technology skills necessary to take on new roles in the library. in 2007, johnson made the case for the increasing need for new areas of emphasis in lis, including specializations such as geographic information systems by pointing out that it is not so much the granular training that is expected of lis education but a higher level technology skill set that allows for the ability to move into these specializations, identify what is needed, assess problems, and make decisions.12 in 2006, neal noted that librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 131 by examination of course catalogs and surveys of both library administrators and technology librarians. the lis educational data was obtained by inspecting course catalogs. course catalogs and website curriculum pages from all ala-accredited lis programs in the united states, canada, and puerto rico were examined in december 2009 for the inclusion of technology-related courses. the catalogs examined were for the 2009–10 academic year. spanish and french catalogs were translated. each available course description was reviewed and those courses with a primary technology component were identified. in a secondary examination the selected courses were closely inspected for the exact technology focus and the primary subject content was noted for each course. courses were then separated into categories by areas of focus and tabulated. a targeted survey identified practicing technology librarians’ perspectives on their level of preparation and continuing skill level needs based on actual job demands. in this survey, librarians with significant technology roles was defined as “for the purposes of this survey a librarian with a significant technology role would be any librarian whose job would very likely be considered “it” if they were not in a library and whose job titles contain words like “systems, digital, web, electronic, network, database, automation, and whose job involves maintaining and/or building various it infrastructures.” the survey was posted on various library and library technology electronic discussion lists in december 2009 and was available for two weeks. library administrative perspectives were also gained through a targeted survey aimed at those with an administrative role of department head or higher. the survey was designed to capture the reported experience library administrators have had with librarians in significant technology roles, primarily as it relates to skill levels, availability, hiring, and retention. this survey was posted on to various library administrative and technology discussion lists in december 2009 and was also available for two weeks. both surveys included many similar questions to compare and contrast viewpoints. results were tabulated to form an overarching picture and some relevant comparisons were made. there are limitations and inherent issues with this type of research. catalog examinations when completed by qualified librarians can hold great accuracy; however, the introduction of bias or misinterpretation is always possible.26 when categorizing courses, the authors reviewed course descriptions three separate times to ensure accuracy. courses in doubt were reviewed again with knowledgeable colleagues to obtain a consensus. surveys designed to capture perspectives, views, and experiences are by nature highly subjective and provide data that is both qualitative and quantitative. tabulated data was given strictly simple numerical representation to provide a factual picture of what was reported. specialized competencies in areas such as web development, database design, and management paired with a good working knowledge of classification formats such as xml, marc, ead, rdf and dublin core. educational technology (et) has been identified as an area of expected growth opportunity for libraries and there have been suggestions that more lis programs should partner with et programs to improve lis technology offerings, skills and preprofessional training.21 lis program change, including the apparent coalescing of information technology focused education would appear to be demonstrated by the ischool or ifield caucus of ala accredited programs, however the literature is not clear on if that is actually being evidenced. the ischools organization started in as collective in 2005 with a goal of advancing information science. ischools incorporate a multidisciplinary approach and those with a library science focus are ala accredited.22 a 2009 study interestingly applied abbott’s theoretical framework used in the chaos of disciplines to the ifield.23 resulting in abstract yet relevant conclusions, abbott looks at change in a field through a sociological lens looking for patterns of fractal distinction over time. the study concluded that traditional lis education remained at the heart of the ifield movement and that the real change has been in locale, from libraries to location independent.24 hall’s 2009 study exploring the core of required courses across almost all ala accredited programs reveals that the core curriculum is still principle-centered, but it is focusing less on reference and intermediary activities with a definite shift toward research methods and information technology.25 ■■ method this research study was designed to capture a broad view of technology skill needs, skill availability, and skill acquisition in libraries, while still allowing for some areas of sharper focus on stakeholder perspectives. the four primary stakeholder groups in this study were identified as lis educators, lis students, working librarians, and library administrators. the research questions cover three main areas of technology skill acquisition and employment. one area is lis education and whether the status of all technology course offerings has changed in recent years in response to market demands. the second area is the experience of librarians with significant technology roles with regards to job availability, readiness, and technology skill acquisition. the third area is, the perception of library administrators regarding the availability and readiness of librarians with technology roles. to cover the research questions and provide a broad situational view, the research was triangulated and aimed at the three question areas. data collection was accomplished 132 information technology and libraries | september 2011 may arguably be considered description or cataloging. metadata was included because it is an integral part of many new digital services. the categories are presented in column 1, the total number of courses offered is presented in column 2. the number of advanced courses available within each category total is further broken out into parenthesis. some programs offered more than one course in a given category; hence the percentage of programs offering at least one course is given in column 3. additionally, although the librarian survey was targeted to “those with significant technology roles,” it would appear that the definition of “significant” seemed to vary in interpretation by the respondents. this is discussed in further detail in the findings. given the limitations of this type of research, the authors did not attempt to find definite correlations, however trends and patterns are clearly revealed. ■■ catalog findings course catalogs from all 57 ala-accredited programs in the united states, canada, and puerto rico were examined for the inclusion of technology-related courses. a total of 439 technology-related courses were offered across the 57 lis programs, including certificate program course offerings. the total number of technology-related courses offered by program ranged from 2 to 20. the mean number of courses offered per program was 7.7, the median was 10, and the mode was 4. table 1 shows the total number of technology courses being offered per program by matching them with the number of courses they offer. catalog course content descriptions were analyzed looking for a technology focus. the fifteen categories noted in table 2 were selected as representative of the technology-related courses offered. it is acknowledged that some course content may be overlapping, but each course was placed in only one category based on its primary content. note also the inclusion of “metadata markup” which table 1. number of technology-related courses being offered per program # of programs offering # of courses offered 1 offers 2 courses 6 offer 3 courses 8 offer 4 courses 6 offer 5 courses 7 offer 6 courses 5 offer 7 courses 5 offer 8 courses 1 offer 9 courses 6 offer 10 courses 1 offers 11 courses 3 offer 12 courses 2 offer 13 courses 2 offer 14 courses 1 offers 15 courses 1 offers 17 courses 1 offers 18 courses 1 offers 20 courses table 2. course content description and number of courses offered across all programs. the number of advanced courses in the total is given in parenthesis. course type as categorized by the course content description in the lis program catalog # of courses offered % of programs offering at least 1 course database design, development and maintenance 47 (7) 70 web architecture (web design, development, usability) 52 (11) 68 broad technology survey courses (basics of library technologies and overviews) 50 65 digital libraries 43 (4) 61 systems analysis, server management 49 (6) 60 metadata markup (dc, ead, xml, rdf) 43 (10) 50 digital imaging, audio and video production 33 (5) 47 automation and integrated library systems 21 37 networks 32 (3) 35 human computer interaction 21 (4) 29 instructional technology 12 21 computer programming languages, open source technologies 12 (2) 17 web 2.0 (social networking, virtual reality, third party api’s) 11 17 user it management (microcomputers in libraries) 6 10 geographic information systems 6 (1) 8 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 133 ■■ perspectives on job availability, readiness and skill acquisition as previously noted in the method, two surveys were administered to collect participant viewpoint data pertinent to the study. reponses were carefully checked to determine whether they met the criteria for inclusion in the study. no attempt was made to disqualify respondents based solely on job title. it did appear that a significant number of non-target subjects did initially reply to the librarian survey, but quit the survey at the technology-related questions. final inclusion was based on either an it-related job title or if the respondent answered the technology questions regardless of job title. tables 3–5 report demographic response data. ■■ perspectives on job and candidate availability a 2009 study by matthew and pardue asked the question “what skills do librarians need in today’s world?”29 they sought to answer this question by performing a content analysis, spread over five months, of randomly selected jobs from ala’s joblist. what they found in the area of technology was a significant need for web development, an assessment of the course catalog facts reveals that there have been increases in the number of technology courses offered in lis programs, but is it enough? significant longitudinal data shows an increased emphasis in the area of metadata. a 2008 study of the total number of lis courses offering internet or electronic resources and metadata schemas, found that the number of programs offering such as being ten (17.5 percent) with only twelve metadata courses offered in total.27 current results show 43 metadata courses offered with 50 percent of lis programs offering at least one course. the lack of a solid basis in web 2.0 applications and integration as reported by aharony is confirmed by the current catalog data, with only 17 percent of programs offering a course.28 while at first glance it looks like many technology-related courses are currently being offered in lis programs, a closer inspection reveals cause for concern. many of these courses should be offered by 100 percent of lis programs and advanced courses in many areas should be offered as well. while there may be some overlap of content in some of these course descriptions, the percentages are still too low to deduce that lis graduates, without preprofessional technology experience or education, are really prepared to take on serious technology roles in academic libraries. table 3. response data responses administrative survey librarian survey total responses 185 382 total usable (qualified) 146 227 table 4. respondents institution by size by type administrative survey librarian survey under 5,000 37 72 5,000 10,000 25 31 10,000 15,000 18 28 15,000 20,000 11 20 20,000 25,000 13 21 25,000 30,000 16 13 30,000 35,000 4 11 35,000 40,000 5 9 more than 40,000 12 21 unknown 5 1 table 5. respondent type administrative survey: position # of responses dean, director, university librarian 46 department head 71 manager or other leadership role 29 librarian survey: general area of work # of responses public services 48 systems 42 web services 32 reporting dual roles 31 digital librarian 29 electronic resources librarian 28 emerging/instructional technologies 18 administrative 10 metadata/cataloger 9 technical services 7 distance education librarian 4 demographic data 134 information technology and libraries | september 2011 based on the difficulty rating and the classifications were then averaged by difficulty. some respondents were unsure of difficulty ratings because the searches happened before their presence at their current library and those searches were excluded. position classifications with less than five searches were excluded from averaging and are marked “na” in table 6. the difficulty rubric is as follows: 1 = easy; 2 = not too bad, pretty straightforward search; 3 = a bit tough, the search was protracted; 4 = very difficult, required more than one search; 5 = unable to fill the position. it is to be noted that almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 2.48. a comparable set of questions was posted to the librarian survey. we asked librarians to report professional level technology positions they had held in the past five years along with any current job searches. 164 responses were received by people indicating that they had held such a position or were searching for one, with the total number of positions/searches being reported at 316 with some respondents reporting multiple positions. respondents reported having between one and five different positions with the average number being 1.92 jobs per respondent (see table 7). the respondents were also asked to give the position title for each position held or positions they were applying for as well as the difficulty encountered in obtaining the position. like the administrative report, job titles were project management, systems development, and systems applications. further they suggest that some librarians are using a substantial professional it skills subset. this article’s literature review points out that there are assertions being made that some technology-related librarian positions are difficult to fill and may in fact be filled by non-mls professionals. in the associated surveys the authors sought to capture data related to actual job availability, search experiences and perspectives by both library administration and librarians. note that both mls librarians and a few professional library it staff completed the survey. the distinction is made where appropriate. the survey asked library administrators if they had hired a technology professional position in the past five years. 146 responses were received and 100 respondents indicated that they had conducted such a search, with the total number of searches being reported at 167. of these searches, 22 did not meet the criteria for inclusion due to other missing data such as job title. the total reported number of librarian/professional level technology positions that were posted for hire by these respondents was 145 with some respondents reporting multiple searches for the same or different positions. respondents conducting searches reported having between 1 and 5 searches total with the average number being 1.45 per respondent. the respondents were also asked to provide the position title for each search, the difficulty encountered in conducting the search, and the success rate. job titles were divided into categories to ascertain how many positions in each category reported having a relevant search conducted. each search was then assigned a point value table 6. administrative report on positions open, searches and difficulty of search (n = 145) position classification searches search difficulty systems/ automation librarian 40 2.78 digital librarian 32 2.6 emerging & instructional technology librarian 15 2.53 web services/ development librarian 33 2.51 electronic resources librarian 22 1.95 database manager 1 na network librarian/ professional 1 na table 7. librarian report on positions held or current searches and difficulty (n = 316) position classification # of positions/ searches search difficulty administrative 8 3 technical services 17 2.11 public services 57 2.1 systems/ automation librarian 76 1.89 web services/ development librarian 38 1.89 electronic resources librarian 39 1.87 digital librarian 41 1.8 metadata/cataloger 13 1.77 distance education librarian 6 1.66 emerging & instructional technology librarian 21 1.61 reporting dual roles 30 na librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 135 employment status for “newly minted” mls graduates having just entered the profession were asked in a survey “did specific information technology or computer skills lead to you getting a job?” the answer was a “resounding yes” by 66 percent of the respondents.33 experience is divided into categories to ascertain how many positions in each classification category. each position classification was then assigned a point value base on how the respondents rated the difficulty of those particular searches and the classifications were then averaged by difficulty using the same scale that was applied in the administrative survey. again, almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 1.9. to provide as accurate a picture as possible the surveys asked both groups to indicate if any well known mitigating factors contributed to complications with the job searches. these factors are shown in table 8 which stacks both groups for comparison. this particular dataset reveals some interesting patterns. those roles that were in the most demand were the also the most difficult to hire for, while these also were the easier positions for candidates to find. librarians also listed more job categories as having a significant technology component than the administrators had. perhaps most notable is the discrepancy shown between how administrators perceive the qualifications of candidates as compared to how candidates view themselves. while both groups acknowledge lack of it skills and qualifications as the number one mitigating factor, library administrators perceive the problem as being significantly more serious. this data backs up other recent findings that important new job categories are being defined in lis.30 the data also further support that these roles, while centering on core librarianship principles, have a different skill set.31 ■■ job readiness perspectives issues of job readiness for academic librarians need to be looked at from a number of different perspectives. job readiness can be understood in one way by a candidate and can be something different to an employer. job readiness is not only of critical concern at the beginning of a librarian’s career, clearly this attribute continues to be significant throughout an individual’s length of service in one or more roles and to one or more employers. job readiness is composed of several factors, the most important being education, experience and ongoing skill acquisition. while this is certainly true for all librarians it is of even more concern to those librarians with significant technology roles because of rapid changes in technology. a concern has been established in the literature and in this study that lis education, in the areas of technology, may be inadequate and lack the intensity necessary for modern libraries. this perception has been backed up by entrants to the profession.32 that technology skills are extremely important to library employers has been evident for at least a decade. in 2001 a case study on table 8. mitigating factors in hiring and job search (n = 93) administrative survey: mitigating factors in hiring as a percentage of respondents to the question (n = 93) % of responses we had difficulty getting an applicant pool with adequate skills 54 we are unable to offer qualified candidates what we feel is a competitive salary. 38 we are located in what may reasonably be perceived as an undesirable area to live. 23 we are located in an area with a very high cost of living. 23 we have an it infrastructure or environment that we and/or a candidate may have perceived as unacceptable. 20 the current economic climate has made hiring for these types of positions easier. 18 a successful candidate did not accept an offer of employment 13 librarian survey: mitigating factors in job search as a percentage of respondents to the question (n = 198) % of responses i suspect i may not have/had adequate skills, experience or i was otherwise unqualified. 25 i have not been able to find a position for what i consider to be a fair salary. 11 many jobs are located in what may reasonably be perceived as an undesirable area to live. 10 many jobs are located in an area with a very high cost of living. 15 some jobs have an it infrastructure or environment that i have perceived as unacceptable. 10 the current economic climate has now made finding these types of positions tougher. 22 i was a successful candidate but i could or did not accept an offer of employment. 3 136 information technology and libraries | september 2011 library technology experience they preferred from a candidate. there were 97 responses; the range of preferred experience was 0–7, the mean was 3.06, and the mode was 3. librarians were also asked how much experience they had in a technology-related library role. there were 187 responses; the range of experience was 0–39 years, the mean was 8.7, the mode was 5. when participating administrators were asked if they felt it was necessary to have an mlis librarian fill a technology-related role that is heavily user-centric, 110 administrators responded. also a very important factor, with one study of academic library search committees reporting committee members mentioning that “experience trumps education.”34 this study sought to gather data on possible patterns in the job readiness area. the authors wanted to know how job candidates and employers felt about the viability of new mls graduates, how experience factored into job readiness, how much experience is out there and how long term experience impacted expectations. the survey asked administrators how many years of table 9. question sets related to experience factors by group administrative survey strongly disagree disagree can’t say agree strongly agree new librarians right out of graduate school seem to be adequately prepared (n = 111) 7% 40% 24% 28% 1% librarians with undergraduate or 2nd graduate degrees in a technology/computer fields seem adequately prepared (n = 109) 1% 9% 48% 39% 4% librarians with pre-professional technologyrelated experience seem adequately prepared (n = 109) 1% 6% 47% 41% 8% librarians with some (up to 3 years) post mls technology experience seem adequately prepared (n = 111) 1% 10% 17% 62% 10% librarians with more than 3 years post mls technology experience seem adequately prepared (n = 111) 1% 3% 24% 55% 16% librarians never seem adequately prepared for technology roles (n = 111) 19% 55% 12% 7% 6% librarian survey strongly disagree disagree other agree strongly agree as a new librarian right out of graduate school i was adequately prepared (n = 187) 12% 19% no grad degree 3% 42% 8% i have an undergraduate or 2nd graduate degree in a technology/computer field that has helped me be adequately prepared (n = 187) 13% 7% no tech degree 60% 13% 6% i had pre-professional technology-related experience that helped me be adequately prepared (n = 187) 3% 7% no such experience 20% 43% 27% i have less than 3 years of post mls technology experience and i am adequately prepared (n = 180) 6% 13% na 63% 16% 1% i have more than 3 years of post mls technology experience and i am adequately prepared (n = 184) 2% 12% na 17% 48% 20% i have never felt like i am adequately prepared for technology roles (n = 186) 19% 43% neutral 23% 12% 2% librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 137 readiness of new librarians and the value of related technology degrees. areas of agreement are noted in the importance of preprofessional experience, three or more years of experience, and the generally positive attitude regarding librarians’ ability to successfully take on significant technology roles in libraries. ■■ ongoing skill acquisition and retention how librarians with significant technology roles acquire the skills needed to do their jobs and how they keep those skills current was of great interest in this study. the importance of preprofessional experience has been noted but we should also include the value of service learning in lis education as an important starting point. successful service learning experiences include practicum and partnerships with libraries in need of technology-related services. successful projects such as online exhibits, wireless policies, taxonomy-creation and cross-walking for contentdm are just a few of the service projects that have given lis students real-world experience.35 this responses ranged from 50 percent “yes,” 38 percent “no,” and 12 percent “unsure.” to the same question, 195 practicing technology librarians responded with 58 percent “yes,” 23 percent “no,” and 20 percent “unsure.” the administrator participants were asked if they had ever had to fill a technology-related librarian role with a non-mls hire simply because they were unable to find a qualified librarian to fill the job. of 106 responses, 22 percent reported that they hired a non-mls candidate. the librarian participants were also was asked to report on mls status; out of 194 responses, 93 percent reported holding an mls or equivalent. the survey also asked the librarian participants to report what year they graduated from their mls program as the authors felt this data was important to the inherent longitudinal perspectives reported in the study. of 162 responses, participants reported graduating between 1972–2009. the mean was 1999, the median was 2002, and the mode was 2004. table 9 shows a question set related to experience factors, which stacks both groups for comparison. there are a few notable points in this particular dataset including what appears to be an area of disagreement between administrators and librarians about the table 10. education and skill supplementation for librarians with technology roles administrative survey: in what ways have you supplemented training for your librarians or professional staff with technology-related roles? (does not include ala conferences) % we have paid for technology-related conferences and pre-conferences. 79 we have paid for or allowed time off for classes. 72 we have paid for or allowed time for off online workshops and /or tutorials 87 we have paid for books or other learning materials. 55 we have paid for some or all of a 1st or 2nd graduate degree. 12 we would like to supplement but it is not in our budget. 5 we feel that keeping up with technology is essential for librarians with technology-related roles. 73 librarian survey: in what ways have you supplemented your own education related to technology skill development in terms of your time and/or money? (not including ala conferences) % i have attended technology-related conferences and pre-conferences. 73 i have taken classes. 60 i have taken online workshops and/or tutorials 87 i have bought books or other learning materials. 77 i am getting a 1st or 2nd graduate degree. 9 we would like to supplement my own education but i can not afford it. 13 i would like to supplement my own education but i do not have time. 13 i have not had to supplement in any way. 1 i feel that keeping up with technology is essential for librarians with technology-related roles. 84 i feel that keeping up with technology is somewhat futile. 11 138 information technology and libraries | september 2011 librarians who have transitioned successfully into technology centric roles. this supports the perception that experience and on the job learning play a leading role in the development of technology skills for librarians. openended survey comments also revealed a number of staff who initially were hired in an it role and then went on to acquire an mls while continuing in their technologyfocused role. retention is sometimes problematic for librarians with it roles, primarily because many of them are also employable in many other settings apart from libraries. the survey asked administrators “do you know any librarians with technology roles that have taken it positions outside the library field?” and out of 111 respondents, 33 percent answered “yes.” in open-ended responses the most common reasons administrators felt retention may be a problem was salary, lack of challenges/opportunities, and risk averse cultures. the survey also asked the librarian group “do you think you would ever consider taking an it position outside the library field?” out of 190 respondents; 34 percent answered “yes,” 23 percent “yes, but only if it was education related,” and 42 percent “no.” additionally 38 percent of these librarian respondents knew a librarian who took an it position outside the library field. for the librarian participants an open response field in the survey, named work environment and lack of support for technology as the most often named reasons for this leaving a position. the surveys used in this research study covered several complicated issues. those who responded to the surveys were encouraged to leave open text comments research study asked administrators and librarians in what formal ways they supplement their ongoing education and skill acquisition. table 10 shows these results in a stacked format for comparison. also of interest in this data set is the higher level of importance librarians place on continuing skill development in the area of technology. in open ended text responses a number of librarians reported that the less formal methods of monitoring electronic discussion lists and articles was also a very important part of keeping up with technology in their area. the priority of staying educated, active and current for librarians with significant technology roles cannot be underestimated; what tennant defines as technology agility, the capacity to learn constantly and quickly. i cannot make this point strongly enough. it does not matter what they know now. can they assess a new technology and what it may do (or not do) for your library? can they stay up to date? can they learn a new technology without formal training? if they can’t they will find it difficult to do the job.36 not all librarians with technology roles start out in those positions and thus role transformation must be examined. in some cases librarians with more traditional roles such as reference and collection development have transformed their skill set and taken on technology centric roles. table 11 shows the results of the survey questions related to role transformation in a stacked format for comparison. to be noted in this data set is the large number of table 11. role transformation from traditional library roles to technology centric roles and the reverse. administrative survey (n = 104) % we have had one or more librarians make this transformation successfully. 53 we have had one or more librarians attempt this transformation with some success. 35 we have had one or more librarians attempt this transformation without success. 17 some have been interested in doing this but have not done so. 14 we do not seem to have had anyone interested in this 11 we have had one or more librarians who started out in a technology-related librarian role but have left it for a more traditional librarian role. 5 librarian survey (n = 184) % i started out in a technology-related librarian role and i am still in it. 45 i have made a complete technology role transformation successfully from another type of librarian role. 30 i have attempted to make a technology role transformation but with only some success. 12 i have made a technology role transformation but sometimes i wish i had not. 9 i have made a technology role transformation but i wish i had not and i am interested in returning to a more traditional librarian role. 9 i am not a librarian. 4 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 139 vary considerably from program to program and the content of individual courses appears to vary considerably as well. there appears to be a clear need for additional courses at a more advanced level. this need is evidenced by the experiences of both information technology job candidates and the administrators involved in the hiring decisions. there are clearly still difficulties in both the acquisition of needed skill sets for certain positions and in actual hiring for some information technology positions. there are also some discrepancies between how administrators perceive candidates’ qualifications as compared to how the candidates view themselves. administrators perceive the problem of a lack of it skills/qualifications as more serious than do candidates. the two groups also differ on the question of “readiness” of new professionals. the two groups do agree on the importance of preprofessional experience, and they both exhibit generally positive attitudes toward librarians’ ability to successfully take on significant technology roles in libraries. in several key areas. a large number of comments were received and many of them were of considerable length. many individuals clearly wanted to be heard, others were concerned their story would not be captured in the data, and many expressed a genuine overall interest in the topic. a few salient comments from a variety of areas covered are given in table 12. ■■ conclusion this study seeks to provide an overview of the current issues related to it staffing in academic libraries by reporting on three areas dealing with library skill acquisition and employment. with regards to the status of technology course offerings in lis programs, there has been a significant increase in the number of technologyrelated courses, but the numbers of technology courses table 12. a sample of open ended responses from the two surveys administrative survey “there is a huge need for more and adequate technology training for librarians. it is essential for libraries to remain viable in the future.” “only one library technology position (coordinator) is a professional librarian. others are professional positions without mls.” “there is a lot of competition for few jobs, especially in the current economic climate.” “we finally hired at the level of technician as none of the mls candidates had the necessary qualifications.” “if i wanted a position that would develop strategy for the library’s tools on the web or create a digitization program for special collections, i probably would want an mls with library experience simply because they understand the expectations and the environment.” “number of years of experience in technology is not as important as a willingness to learn and keep current. sometimes old dogs won’t move on to new tricks. sometimes new dogs aren’t interested in learning tricks.” librarian survey “i believe that because technology is constantly changing and evolving, librarians in technology-oriented positions must do the same.” “my problem with being a systems librarian in a small institution is that the job was 24/7/365. way too much stress with no down time.” “i have left the library field for a few years but came back. my motivation was a higher salary, but that didn’t really happen.” “i’m considering leaving my current position because the technology role (which i do love) was added to my position without much training or support. now that part of my job is growing so that i can’t keep up with all my duties.” “i don’t think that library school alone prepared me for my job. i had to do a lot of external study and work to learn what i did, and worked as a part-time systems library assistant while in school, where i learned the majority of what prepared me for my current job.” “library schools need to be more rigorous about teaching students how to innovate with technology, not just use tools others have built. you can’t convert “traditional” librarians into technology roles without rigorous study. otherwise, you will get mediocre and even dangerous results.” 140 information technology and libraries | september 2011 16. ibid., 53–54. 17. thomas w. leonhardt, “thoughts on library education,” technicalities 27, no. 3 (2007): 4–7. 18. thomas w. leonhardt, “library and information science education” technicalities 27, no. 2 (2007): 3–6. 19. noa aharony, “web 2.0 in u.s. lis schools: are they missing the boat?” ariande 30, no. 54 (2008): 1. 20. chuck thomas and salwa ismail patel, “competencybased training for digital librarians: a viable strategy for an evolving workforce?” journal of education for library & information science, 49, no. 4 (2008): 298–309. 21. michael j. miller, “information communication technology infusion in 21st century librarianship: a proposal for a blended core course,” journal of education for library & information science 48, no. 3 (2007): 202–17. 22. “about the ischools.” (2010); http://www.ischools.org/ site/about/ (accessed 9/1/2010). 23. laurie j. bonnici, manimegalai m. subramaniam, and kathleen burnett, “everything old is new again: the evolution of library and information science education from lis to ifield,” journal of education for library & information science 50, no. 4 (2009): 263–74; andrew abbott, the chaos of disciplines (chicago: chicago univ. pr., 2001). 24. bonnici, “everything old is new again,” 263–74. 25. russell a. hall, “exploring the core: an examination of required courses in ala-accredited,” education for information 27, no. 1 (2009): 57–67. 26. ibid., 62. 27. jane m. davis, “a survey of cataloging education: are library schools listening?” cataloging & cataloging quarterly 46, no. 2 (2008): 182–200. 28. aharony, “web 2.0 in u.s. lis,” 1. 29. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57. 30. “redefining lis jobs,” library technology reports 45, no. 3, (2007): 40. 31. youngok choi and edie rasmussen, “what qualifications and skill are important for digital librarian positions in academic libraries? a job advertisement analysis,” the journal of academic librarianship 35, no. 5 (2009): 457–67. 32. carla j. soffle and kim leeder, “practitioners and library education: a crisis of understanding,” journal of education for library & information science 46, no. 4 (2005): 312–19. 33. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2, (2001): 21–38. 34. mary a. ball and katherine schilling, “service learning, technology and lis education,” journal of education for library & information science 47, no. 4 (2006): 277–90. 35. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2 (2001): 21–38. 36. roy tennant, “the most important management decision: hiring staff for the new millennium,” library journal 123, no. 3 (1998): 102. more research is still needed to identify the key technology skills needed. case studies of successful library technology teams and individuals may reveal more about the process of skill acquisition. questions regarding how much can be taught in lis courses or practicum, and how much must be expected through on-the-job experience are good areas for more research. references 1. james michalko, constance malpas and arnold arcolio, “research libraries, risk and systematic change,” oclc research (mar. 2010), http://www.oclc.org/research/publications/ library/2010/2010-03.pdf. 2. lori a. goetsch, reinventing our work, “new and emerging roles for academic librarians,” journal of library administration 48, no. 2 (2008): 157–72. 3. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college and research libraries 70, no. 3 (2009): 250–57. 4. peggy johnson, “from the editor’s desk,” technicalities 27, no. 3 (2007): 2–4. 5. ton debruyn, “questioning the focus of lis education,” journal of education for library & information science 48, no. 2 (2007): 108–15. 6. jacquelyn erdman, “education for a new breed of librarian,” reference librarian 47, no. 98 (2007): 93–94. 7. “educating library and information science professionals for a new century: the kaliper report,” executive summary. aliper advisory committee, alise. (reston, virginia, july 2000), http://www.si.umich.edu/~durrance/textdocs/ kaliperfinalr.pdf (accessed june 1, 2010). 8. karen markey, “current educational trends in library and information science curricula,” journal of education for library and information science 45, no. 4 (2004): 317–39. 9. michael gorman, “whither library education?” new library world 105, no. 9/10 (2004): 376–80; michael gorman, “what ails library education?” journal of academic librarianship 30, no. 2 (2004): 99–101. 10. andrew dillon and april norris, “crying wolf: an examination and reconsideration of the perception of crisis in lis education,” journal of education for library & information science 46, no. 4, (2005): 208–98. 11. leigh s. estabrook, “crying wolf: a response,” journal of education for library & information science 46, no. 4 (2005):299–303. 12. ian m. johnson, “education for librarianship and information studies: fit for purpose?” information development 23, no.1 (2007): 13–14. 13. james g. neal, “raised by wolves,” library journal 131, no. 3 (2006): 42–44. 14. sheila s. intner, “library education for the third millennium,” technicalities 24, no. 6 (2004): 10–12 15. renee d. mckinney, “draft proposed ala core competencies compared to ala-accredited, candidate, and precandidate program curricula: a preliminary analysis,” journal of education for library & information science 47 no.1 (2006): 52–77. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 205 kayla l. quinney, sara d. smith, and quinn galbraith bridging the gap: self-directed staff technology training of hbll patrons. as anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technologies. for example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and faculty do so. also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. after concluding that staff and faculty were not as connected as their student patrons are to technology, library administration developed the technology challenge to help close this gap. the technology challenge was a self-directed training program requiring participants to explore new technology on their own by spending at least fifteen minutes each day learning new technology skills. this program was successful in promoting lifelong learning by teaching technology applicable to the work and home lives of hbll employees. we will first discuss literature that shows how technology training can help academic librarians connect with student patrons, and then we will describe the technology challenge and demonstrate how it aligns with the principles of self-directed learning. the training will be evaluated by an analysis of the results of two surveys given to participants before and after the technology challenge was implemented. ■■ library 2.0 and “librarian 2.0” hbll wasn’t the first to notice the gap between librarians and students, mcdonald and thomas noted that “gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 college students, who grew up with technology, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 the “digital natives” belong to the millennial generation, described by shish and allen as a generation of “learners raised on and confirmed experts in the latest, fastest, coolest, greatest, newest electronic technologies.”3 according to sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 students are undergraduates, as members of the millennial generation, are proficient in web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. to remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. because leaders at the harold b. lee library of brigham young university (hbll) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the technology challenge, a self-directed technology training program that rewarded employees for exploring technology daily. the purpose of this paper is to examine the technology challenge through an analysis of results of surveys given to participants before and after the technology challenge was implemented. the program will also be evaluated in terms of the adult learning theories of andragogy and selfdirected learning. hbll found that a self-directed approach fosters technology skills that librarians need to best serve students. in addition, it promotes lifelong learning habits to keep abreast of emerging technologies. this paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. l eaders at the harold b. lee library of brigham young university (hbll) began to suspect a need for technology training when employees were asked during a meeting if they owned an ipod or mp3 player. out of the twenty attendees, only two raised their hands—one of whom worked for it. perceiving a technology gap between hbll employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the technologies that student patrons use daily. to best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. hbll found that a self-directed learning approach to staff technology training not only fosters technology skills, but also promotes lifelong learning habits. to further examine the technology gap between librarians and students, the hbll staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. student employees were surveyed as representatives of the larger student body, which composes the majority kayla l. quinney (quinster27@gmail.com) is research specialist, sara d. smith (saradsmith@gmail.com) is research specialist, and quinn galbraith (quinn_galbraith@byu.edu) is library human resource training and development manager, brigham young university library, provo, utah. 206 information technology and libraries | december 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 learning 2.0 encouraged library staff to explore web 2.0 tools by completing twenty-three exercises involving new technologies. plcmc’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 these programs—and the technology challenge implemented by hbll—integrate the theories of adult learning. in the 1960s and 1970s, malcolm knowles introduced the theory of andragogy to describe the way adults learn.28 knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowledge, and (5) are best motivated by internal rather than external factors.29 the theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exercise independence in determining how and what they learn.30 these theories have had a considerable effect on adult education practice31 and employee development programs.32 when adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ the technology challenge hbll’s technology challenge is similar to learning 2.0 in that it encourages self-directed exploration of web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. these features encouraged more self-directed learning in areas of participant interest as well as habit formation. it is not our purpose to critique learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for libraries to apply self-directed learning to technology training. the technology challenge was implemented from june 2007 to january 2008. hbll staff included 175 full-time employees, 96 of whom participated in the challenge. (the student employees were not involved.) participants were asked to spend fifteen minutes each day learning a new technology skill. hbll leaders used rewards to make the program enjoyable and to motivate participation: for each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accustomed to easily and quickly gathering information relevant to their lives from the internet and from friends. shish and allen claimed that millennials prefer “interactive, hyper-linked multimedia over the traditional static, textoriented printed items. they want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 these students arrive on campus expecting “to handle the challenges of scholarly research” using similar methods and technologies.6 interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “web 2.0.” abram argued that web 2.0 technology “could be useful in an enterprise, institutional research, or community environment, and could be driven or introduced by the library.”7 “library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “web 2.0 opportunities in a library environment.”8 manesss described library 2.0 is user-centered, social, innovative, and provider of a multimedia experiences.9 it is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 libraries have been using web 2.0 technology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. blogs allow libraries to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 social networks can be particularly useful in connecting with undergraduate students: millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as bates argues, “an informational equivalent of the reliance on one’s facebook friends.”18 students expect library 2.0—and as libraries integrate new technologies, the staff and faculty of academic libraries need to become “librarian 2.0.” according to abram, librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and content needs, and more. librarian 2.0 is where the user is, when the user is there.”19 the modern library user “needs the experience of the web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. librarian 2.0 is prepared to help patrons familiar with web 2.0 to “leverage these [technologies] to make a difference in reaching their goals.”21 therefore staff and faculty “must become adept at key learning technologies themselves.”22 stephen abram asked, “are the expectations of our users increasing faster than our ability to adapt?”23 and this same concern motivated hbll and other institutions to initiate staff technology training programs. the public library of charlotte and mecklenburg county of north carolina (plcmc) developed “learning bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 207 their ability to learn and use technology. to be eligible to receive the gift card, participants were required to take this exit survey. sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey represent the experiences of 66 percent of the participants. of course, if those who had not completed the technology challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. the survey included both quantifiable and open-ended questions (see appendix b for survey results and a list of the open-ended questions). the survey results, along with an analysis of the structure of the challenge itself, demonstrates that the program aligns with knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction the technology challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 hbll provided a variety of challenges and training sessions related to various technologies. technology challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. according to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. group training sessions were organized by hbll leadership and addressed topics such as microsoft office, rss feeds, computer organization skills, and multimedia software. other learning methods included web tutorials, dvds, large group discussions, and one-on-one tutoring. the group training classes preferred by hbll employees may be considered more teacher-directed than self-directed, but the technology challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. the structure of the technology challenge allowed participants to set their own pace. staff and faculty were given several months to complete the challenge and were responsible to pace themselves. on the exit survey, one participant commented: “if i didn’t get anything done one week, there wasn’t any pressure.” another enjoyed flexibility in deciding when and where to complete the tasks: “i liked being able to do the challenge anywhere. when i had a few minutes between appointments, classes, board game called “techopoly.” participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. tasks fell into one of four categories: software, hardware, library technology, and the internet. participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. examples of tasks included attending workshops, exploring online tutorials, and reading books or articles about a relevant topic. for each hundred points earned, participants could complete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo cd (see appendix a for a more complete list). participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. before beginning the challenge, all participants were surveyed about their current use of technology. on this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. these results provided a focus for technology challenge trainings and mini-challenges. while not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. forty-four percent reported that time was the greatest impediment to learning new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in hbll employees and that, at least for a few months, it was worth any potential loss in productivity. because participants could chose how and when they learned technology, they could incorporate the challenge into their work schedules according to their own needs, interests, and time constraints. of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. although not all participants completed the challenge, most were involved to some extent in learning technology during this time. ■■ the technology challenge and adult learning after finishing the challenge, participants took an exit survey to evaluate the experience and report changes in 208 information technology and libraries | december 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the challenge provided a way for employees to channel their desire to learn technology. immediate application learners need to see an opportunity for immediate application of their knowledge: ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 because of the need for immediate application, the technology challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. hbll leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. here is one example of how the technology challenge catered to adult learners’ need to apply what they learn: before designing the challenge, hbll held a training session to teach employees the basics of photoshop. even though attendees were on the clock, the turnout was discouraging. library leaders knew they needed to try something new. in the revamped photoshop workshop that was offered as part of the technology challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn dvd copies. this time, the class was full: the same computer program that before drew only a few people was now exciting and useful. focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. motivation as stated by ota et al., adults are motivated by external factors but are usually more motivated by internal factors: “adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 on the entrance survey, participants were given the opportunity to comment on their reasons for participating in the challenge. the gift card, an example of an external motivation, was frequently cited as an important motivation. but many also commented on more internal motivations: “it’s important to my job to stay proficient in new technologies and i’d like to stay current”; “i feel that i need to be up-to-date or meetings i could complete some of the challenges.” employees could also determine how much or how little of the challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. garrison had noted the importance of providing resources and feedback in self-directed learning.35 the techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. hbll also hired a student to assist staff and faculty one-on-one by explaining answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. the entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the challenge and asking them to evaluate their proficiency in technology after the challenge. use of experience the use of experience as a source of learning is important to adult learners: “the richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 the small-group discussions and one-onone problem solving made available to hbll employees certainly fall into these categories. small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 the trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. for example, one session discussed how blogs relate to libraries, and another helped participants learn adobe photoshop skills by editing personal photographs. need to know adult learners are more successful when they desire and recognize a need for new knowledge or skills. the role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 hbll used the generational survey and presurvey to develop a need and desire to learn. the results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the technology challenge to help staff and faculty understand why it was important to learn 2.0 technology. results of the presurvey showed that staff and faculty bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 209 statistical reports or working with colleagues from other libraries.” ■■ “i learned how to set up a server that i now maintain on a semi-regular basis. i learned a lot about sfx and have learned some perl programming language as well that i use in my job daily as i maintain sfx.” ■■ “the new oclc client was probably the most significant. i spent a couple of days in an online class learning to customize the client, and i use what i learned there every single day.” ■■ “i use google docs frequently for one of the projects i am now working on.” participants also indicated weaknesses in the technology challenge. almost 20 percent of those who completed the challenge reported that it was too easy. this is a valid point—the challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. it is important to note that these comments came from those who completed the challenge—other participants may have found the tasks and mini-challenges more difficult. the goal was to provide an introduction to web 2.0, not to train experts. however, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more selfdirection in selecting goals relevant to their experience. to encourage staff and faculty to attend sponsored training sessions as part of the challenge, hbll leaders decided to double points for time spent at these classes. this certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the challenge was too easy to complete. the doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. a possible solution would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. it also may have been informative for purpose of analysis to have surveyed both those who did not complete the challenge as well as those who chose not to participate. because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. there is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the technology challenge to make it more appealing to all of our staff and faculty. several library employees have requested that hbll sponsor another technology challenge program. because of the success of the first and because of continuing interest in technology training, we plan to do so in the future. we will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ lifelong learning staff and faculty responded favorably to the training. none of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. ninety-five percent reported that they enjoyed the process of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encouraging lifelong technology learning. the exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what hbll leaders hoped to accomplish. eighty-nine percent of the participants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the technology challenge. ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technologies. participants commented that the training increased their desire to learn. one observed, “i often need a challenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” the exit survey asked participants to indicate how they now use technology. one employee keeps a blog for her daughter’s dance company, and another said, “i’m on my way to a full-blown googlereader addiction.” another participant applied these new skills at home: “i’m not so afraid of exploring the computer and other software programs. i even recently bought a computer for my own personal use at home.” the technology challenge was also successful in helping employees better serve patrons: “i can now better direct patrons to services that i would otherwise not have known about, such as streaming audio and video and e-book readers.” another participant felt better connected to student patrons: “i understand the students better and the things they use on a daily basis.” staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed specific examples of how they now use technology at work: ■■ “i have attended a few microsoft office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 information technology and libraries | december 2010 2. richard t. sweeny, “reinventing library buildings and services for the millennial generation,” library administration & management 19, no. 4 (2005): 170. 3. win shish and martha allen, “working with generationd: adopting and adapting to cultural learning and change,” library management 28, no. 1/2 (2006): 89. 4. sweeney, “reinventing library buildings,” 170. 5. shish and allen, “working with generation-d,” 96. 6. ibid., 98. 7. stephen abram, “social libraries: the librarian 2.0 pheonomenon,” library resources & technical services 52, no. 2 (2008): 21. 8. ibid. 9. jack m. maness “library 2.0 theory: web 2.0 and its implications for libraries,” webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed jan. 8, 2010). 10. ibid., under “blogs and wikis,” para. 4. 11. laurel ann clyde, “library weblogs,” library management 22, no. 4/5 (2004): 183–89; maness, “library 2.0. theory.” 12. see matthew m. bejune, “wikis in libraries,” information technology & libraries 26, no. 3 (2007): 26–38 ; darlene fichter, “the many forms of e-collaboration: blogs, wikis, portals, groupware, discussion boards, and instant messaging,” online: exploring technology & resources for information professionals 29, no. 4 (2005): 48–50; maness, “library 2.0 theory.” 13. mary ellen bates, “can i facebook that?” online: exploring technology and resources for information professionals 31, no. 5 (2007): 64; sarah elizabeth miller and lauren a. jensen, “connecting and communicating with students on facebook,” computers in libraries 27, no. 8 (2007): 18–22. 14. clyde, “library weblogs,” 183. 15. maness, “library 2.0 theory.” 16. fichter, “many forms of e-collaboration,” 50. 17. sweeney, “reinventing library buildings”; bates, “can i facebook that?” 18. bates, “can i facebook that?” 64. 19. abram, “social libraries,” 21. 20. ibid., 20. 21. ibid., 21. 22. shish and allen, “working with generation-d,” 90. 23. abram, “social libraries,” 20. 24. helene blowers and lori reed, “the c’s of our sea change: plans for training staff, from core competencies to learning 2.0,” computers in libraries 27, no. 2 (2007): 11. 25. helene blowers, learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed jan. 8, 2010). 26. for examples, see ilana kingsley and karen jensen, “learning 2.0: a tool for staff training at the university of alaska fairbanks rasmuson,” the electronic journal of academic & special librarianship 12, no. 1 (2009), http://southernlibrarianship.icaap.org/content/v10n01/kingsley_i01.html (accessed jan. 8, 2010); beverly simmons, “learning (2.0) to be a social library,” tennessee libraries 58, no. 2 (2008): 1–8. 27. for examples, see christine mackenzie, “creating our future: workforce planning for library 2.0 and beyond,” australasian public libraries & information services 20, no. 3 (2007): 118–24; liisa sjoblom, “embracing technology: the deschutes public library’s learning 2.0 program,” ola quarterly 14, no. 2 (2007): 2–6; hui-lan titango and gail l. mason, “learning library 2.0: 23 things @ scpl,” library management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. the purpose of a second technology challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of lifelong learning when it comes to technology. ■■ conclusion hbll’s self-directed technology challenge was successful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of librarian 2.0. abram listed key characteristics and duties of librarian 2.0, including learning the tools of web 2.0; connecting people, technology, and information; embracing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, web syndicasphere, and wikisphere.”41 survey results indicated that hbll employees are on their way to developing these attributes, and that they are better equipped with the skills and tools to keep learning. like plcmc’s learning 2.0, the technology challenge could be replicated in libraries of various sizes. obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the principles of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. whatever the case, there is a great need for library staff and faculty to learn emerging technologies and to keep learning them as technology continues to change and advance. but the most important benefit of a self-directed training program focusing on lifelong learning is effective employee development. the goal of any training program is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. on the exit survey, one participant expressed initially feeling hesitant about the technology challenge and feared that it would increase an already hefty workload. however, once the challenge began, the participant enjoyed “taking the time to learn about new things. i feel i am a better person/librarian because of it.” and that, ultimately, is the goal—not only to create better librarians, but also to create better people. notes 1. robert h. mcdonald and chuck thomas, “disconnects between library culture and millennial generation values,” educause quarterly 29, no. 4 (2006): 4. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 211 ers,” journal of extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed jan. 8, 2010); wayne g. west, “group learning in the workplace,” new directions for adult and continuing education 71 (1996): 51–60. 33. ota et al., “needs of learners.” 34. d. r. garrison, “self-directed learning: toward a comprehensive model,” adult education quarterly 48 (1997): 22. 35. ibid. 36. ota et al., “needs of learners,” under “needs of the adult learner,” para. 4. 37. ota et al., “needs of learners”; west, “group learning.” 38. ota et al., “needs of learners,” under “needs of the adult learner,” para. 2. 39. ibid., para. 6. 40. ibid., para 7. 41. abram, “social library,” 21–22. (2009): 44–56; illinois library association, “continuous improvement: the transformation of staff development,” the illinois library association reporter 26, no. 2 (2008): 4–7; and thomas simpson, “keeping up with technology: orange county library embraces 2.0,” florida libraries 20, no. 2 (2007): 8–10. 28. sharan b. merriam, “andragogy and self-directed learning: pillars of adult learning theory,” new directions for adult & continuing education 89 (2001): 3–13. 29. malcolm shepherd knowles, the modern practice of adult education: from pedagogy to andragogy (new york: cambridge books, 1980). 30. jovita ross-gordon, “adult learners in the classroom,” new directions for student services 102 (2003): 43–52. 31. merriam, “pillars of adult learning”; ross-gordon, “adult learners.” 32. carrie ota et al., “training and the needs of learnappendix a. technology challenge “mini challenges” technology challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. below are some examples of technology mini-challenges: 1. read a library or a technology blog 2. listen to a library podcast 3. check out a book from circulation’s new self-checkout machine 4. complete an online copyright tutorial 5. catalog some books on librarything 6. read an e-book with sony ebook reader or amazon kindle 7. scan photos or copy them from a digital camera and then burn them onto a cd 8. backup data 9. change computer settings 10. schedule meetings with microsoft outlook 11. create a page or comment on a page on the library’s intranet wiki 12. use one of the library’s music databases to listen to music 13. use wordpress or blogger to create a blog 14. post a photo on a blog 15. use google reader or bloglines to subscribe to a blog or news page using rss 16. reserve and check out a digital camera, camcorder, dvr, or slide scanner from the multimedia lab and create something with it 17. convert media on the analog media racks 18. edit a family photograph using photo-editing software 19. attend a class in the multimedia lab 20. make a phone call using skype 212 information technology and libraries | december 2010 how did you like the technology challenge overall? answer response percent strongly disliked 0 0 disliked 0 0 liked 42 66 strongly liked 22 34 how did you like the reporting system used for the technology challenge (the techopoly game)? answer response percent strongly disliked 0 0 disliked 4 6 liked 41 64 strongly liked 19 30 would you participate in another technology challenge? answer response percent yes 64 100 no 0 0 what percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) category average response instructor-led large group 15.3 instructor-led small group 27 one-on-one instruction 3.5 web tutorial 12.8 self-learning (books, articles) 27.4 dvds .5 small group discussion 2.7 large group discussion 2.6 other 6.7 i am more likely to incorporate new technology into my home or work life. answer response percent strongly disagree 0 0 disagree 2 3 agree 49 77 strongly agree 13 20 i enjoy the process of making new technology a part of my work or home life. answer response percent strongly disagree 0 0 disagree 2 3 agree 37 58 strongly agree 24 38 after completing the technology challenge, my desire to learn new technologies has increased. answer response percent strongly disagree 0 0 disagree 7 11 agree 44 69 strongly agree 13 20 i feel i now learn new technologies more quickly. answer response percent strongly disagree 0 0 disagree 20 31 agree 39 61 strongly agree 5 8 appendix b. exit survey results bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 213 open-ended questions ■■ what would you change about the technology challenge? ■■ what did you like about the technology challenge? ■■ what technologies were you introduced to during the technology challenge that you now use on a regular basis? ■■ in what was do you feel the technology challenge has benefited you the most? how much more proficient do you feel in . . . category not any somewhat a lot hardware 31% 64% 5% software 8% 72% 20% internet resources 17% 68% 15% library technology 23% 64% 13% in order for you to succeed in your job, how important is keeping abreast of new technologies to you? answer response percent not important 1 2 important 22 34 very important 41 64 152 information technology and libraries | december 2011 ■■ more from the far side of the k–t boundary in my september column, i offered some old-school suggestions for how we as a profession might cope with our confused and unbalanced times. since then, several more have crossed my mind, and i thought i’d offer them, for what they may be worth: ■■ we can outsource everything but responsibility. whether it’s “the cloud,” vendor acquisition profiles, or shelfready cataloguing, outsourcing has become a popular way of dealing with budgetary and staffing stresses during the past few years. generally speaking, i have serious reservations about outsourcing our services, but i do recognize the imperatives that have caused us to resort to them. that said, in farming out critical library services, we do not at the same time gain license to farm out responsibility for their efficient operation. oversight and quality control are still up to us, and it simply will not wash with patrons today, next year, or a century from now to be told that a collection or service is unacceptably substandard because we outsourced it. a vendor’s failure is our failure, too. it’s still “our stuff,” and so are the services. ■■ we’re here to make decisions, not avoid them. document delivery, patron-driven acquisitions, usability studies, and evidence-based methodologies should help to inform and serve as validity checks for our decisions, not be replacements for them. as with outsourcing and our over-reliance on technology-driven solutions, i fear that these services and methodologies are in real danger of becoming crutches, enabling us to avoid making decisions that may be difficult, unpopular, tedious, or simply too much work. but if decisions regarding collections and services can be reduced to simple questions of demand or the outcome of a survey, then who needs us? it’s our job to make these decisions; demandor survey-driven techniques are simply there to assist us in doing so. ■■ relevance is relative. we talk about “relevance” in much the same breathlessly reverential voice as we speak of the “user” . . . as if there were but one, uniquely “relevant” service model for that single, all-encompassing “user.” one of the perils of our infatuation with “relevance” is the illusion that by adopting this or that technology or targeted service, we are somehow remaining relevant to “the user.” which user? just as not all patrons come to us seeking potboiler romances, so too not all users demand that all services and collections be made available electronically, over mobile platforms. since we do recognize that our resources are finite, rather than pandering to some groups at the expense of others with trendy temporal come-ons, why not instead focus on long-term services and collections that reflect our values? the patrons who really should matter most to us will respect us for this demonstration of our integrity. ■■ libraries are ecosystems. as with the rest of the world around us, libraries comprise arrays of interlocking, interdependent, and often poorly understood/ documented entities, services, and systems. they’ve developed that way over centuries. and just as so often happens in the larger world, any and every change we make can cause a cascade of countless other changes, many of which we might not anticipate before making that seemingly simple initial change. we are stewards of the libraries in which we work: our obligation, as librarians, is to respect what was bequeathed to us, to care for and use it wisely, and to pass it on to those who follow in at least the condition in which we received it—preferably better. environments, including libraries, change and evolve of course, but critics of the supposedly slow pace of change in libraries fail to grasp that our role is just as much that of the conservationist as it is the advocate of development and change. our mission is not change for change’s sake; rather, it is incremental, considered change that will benefit not only today’s patrons and librarians, but respect those of the past and serve those of the future as well. perhaps librarians need an analogue to the medical profession’s hippocratic oath: primum non nocere, “first, do no harm.” ■■ innocents abroad probably few ital readers will be aware (i certainly wasn’t!) that mark twain’s bestselling book during his lifetime was not tom sawyer or huckleberry finn—or any of a host of others of his now better-remembered works— but rather his 1869 travelogue innocents abroad, or the new pilgrims’ progress. the book, which i’ve been savoring in my spare leisure reading time over the past several months, records in journal form twain’s involvement in a voyage in 1867 by a group of american tourists to various locales in mediterranean europe, northern africa, and the near east. in the book, twain gleefully skewers marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. outgoing editor’s column: parting thoughts outgoing editor’s column | truitt 153 committee assignments go, i think it fair to say that this is probably one of the more thankless. board members must be expert in all areas of technology, and as important, willing and able to do a credible job of pretending to be so in those areas where they are not expert! they must be able to recognize and create good prose and to offer authors practical, constructive insights and guidance in the sometimes black art of turning promising manuscripts into great articles. as i think many ital authors will attest, they do a superb job at this. they also write some of the most interesting and perceptive editorial columns you’ll see in ital! ■■ judith carter. it’s really impossible to overstate the contributions made by judith to ital. other than a brief four-year interlude during which i served in the role, judith has been managing editor for much of the past decade and more. she taught me the job when she relinquished it in early 2004, and then graciously offered to take it back again when i was named editor four years later. more than any other single person, she is responsible for the ital you hold in your hands, and she does it with skill and tireless dedication. she also has been my coach, my confidante, and—as only a true friend can be—even my butt-kicker when i was late in observing a deadline, which has not infrequently been the case. thank you for everything, judith. ■■ dan and john. the late dan marmion brought me on board at ital as a board member in 2000; he later asked me to serve as his managing editor. he also encouraged me to succeed john webb as editor in 2007. from both dan and john i learned much about the role of an editor and especially about what ital could and should be. i am endlessly appreciative for their mentoring and hope that i have been reasonably successful in maintaining the high standards that they set for the journal. ■■ the authors. without interesting, well-researched, and timely content, there would be no ital. i have been blessed with a rich and nearly constant supply of superb manuscript submissions that the folks who make up the ital “publication machine” have then turned into a highly stimulating and readable journal. i hope you agree. ■■ the readers. and finally, i thank all of you, gentle readers. you are the reason that ital exists. i have been grateful for your support, your patience, and your always-constructive suggestions. beginning with the march 2012 issue, ital will be edited by bob gerrity of boston college. i’ve been acquainted with bob for a number of years, and i can’t think of a better person to guide this journal through the tour-goers, those they encounter, and of course himself; as with twain generally, it is at turns witty, outlandish, biting, and—by today’s lights—completely lacking in political correctness. in short, it’s vintage mark twain: delicious! i mention innocents abroad not simply because i’m currently enjoying it (and hoping that by saying so, i might pique some other ital reader ’s interest in giving it a test drive) but also because it—as with other books, songs, stories, etc., about journeys-taken—is a metaphor for life. we are all “innocents” in some sense as we traverse the days and years of growth in selfawareness, relationships, work, and all the other facets that make up life. it’s a comforting way of viewing the world, i think. i’ve served with ital in various capacities for more than eleven years. that’s a very long time in terms of one particular ala/lita committee. it’s now time for my journey and ital’s to part ways. this is my final column as editor of this journal. this “innocent” is debarking the ital ship and moving on. ital is the product of the dedicated labor of many people of whom i am but one. for some of them, it is a labor of love. as with the credits at the end of a film, it is customary for an editor in her or his final column to recognize and thank the people who made it all possible. i’d like to do so now. polite audience members know to remain until “the end” rolls by. i hope you’ll help me honor these people by doing so, too: ■■ mary taylor, valerie edmonds, and melissa prentice in the lita office. over the years, they’ve been unfailingly helpful to me, to say nothing of being nearly as unfailingly tolerant of my clueless and occasionally obstreperous, passive-aggressive ignorance of the byzantine ways of the ala bureaucracy. ■■ ala production services. production services folk are the professionals who, among innumerable other skills, copyedit and typeset manuscripts, perform miracles with figures and tables, and generally make ital into the quality product you receive (whether it is celluloseor electron-based). regardless of ital’s future publishing format and directions, count yourself fortunate as long as the good people in production services continue to play a role. i’d especially like to single out tim clifford, ital’s production editor, who over the past several years has brought skill, grace, stability, and a healthy dose of humor to this critical post. ■■ the members—past and present—of the ital editorial board. the editorial board is a lita committee; the members of this committee serve as the editor’s primary group of reviewer-referees of manuscripts submitted for publication consideration. as 154 information technology and libraries | december 2011 “happy trails,” and “t-t-t-t-that’s all, folks!” “the end.” changes that will be coming over the next few years. i wish him the very best and hope that he has as much fun in the job—and on the journey—as have i. from the managing editor i’d like to take this opportunity to give marc truitt my heartfelt thanks and best wishes as he leaves his longterm relationship with information technology and libraries (ital). i appreciate how he ably stepped into the role of managing editor (me) when i needed to resign to focus on my full-time job. a few years later he became the new editor and i accepted his request to be his me. i think we’ve had a good partnership. i’ve nudged marc about the production schedule while he has managed manuscripts, the peer review process, and eloquently represented the journal when needed. marc held and communicated a clear and scholarly view of the journal to the editorial board and to lita. i have fond memories of many cups of tea drunk in various ala conference venues while we discussed ital, lita, and shared news of mutual friends. we endured the loss of our friend and mentor dan marmion together a year ago september when marc wrote a letter which i read at the memorial service. this too may be my final issue of ital. it is unknown at time of printing. i support the online future of ital and have offered my services to robert gerrity until a paper version is no longer supported and we successfully transition my duties into an online environment/to a new me. i know he will take the journal into its new iteration with skill and grace. i have served lita and ital for over 13 years and am proud of the quality peer reviewed journal dan marmion, john webb, marc truitt, the editorial board members and i have shared with the members of lita. it has also been my honor to communicate with each of the authors and to facilitate their scholarly communication to our profession. without the authors, where would we be? thank you all, judith carter. editorial board thoughts: policy before technology — don’t outkick the coverage editorial board thoughts policy before technology don’t outkick the coverage brady lund information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14773 brady lund (blund@g.emporia.edu) is a doctoral candidate and lecturer at emporia state university and a member of the ital editorial board. © 2022. opinions expressed in this column are the author’s and do not reflect those of the editorial board as a whole or of core, a division of ala. in the race to adopt the newest and best, practical considerations for emerging technologies are frequently overlooked. technology can set an organization apart and, in the case of libraries, be instrumental in helping demonstrate value. yet, all new technologies carry additional, potentially unpleasant consequences, whether it be threats to privacy and security, barriers to accessibility or risks to health, learning barriers, or exposure to misinformation. organizations must consider these threats before introducing new technologies, rather than the other way around. to illustrate these threats and their policy implications, i will briefly discuss two popular technologies/innovations—virtual reality and data analytics—and the threats that are often overlooked by organizations and how they may be appropriately addressed by policy. virtual reality (vr) has quickly become a popular technology in all types of libraries and learning organizations. as noted in many recent publications, vr provides an immersive and interactive medium to engage with learning and entertainment content.1 of course, libraries are always seeking new ways to engage patrons with their collections and services, so it is natural that there would be high interest in this technology. however, i have observed that this technology is frequently made available with little foresight or oversight of potential issues. the engaging interface of vr technology also presents risks to certain individuals. it has been known to invoke seizures among those who are predisposed and can cause severe dizziness and disorientation.2 these risks are severe enough that the institutional review board at my university required a safety disclaimer be included for any project that utilized vr technology for learning. however, inclusion of a disclaimer is not necessarily common practice in library research and certainly not for non-research projects. further, substantial learning barriers should be acknowledged for virtual reality technology. a learning curve is perhaps a less-serious threat, compared to the health and safety risks, but can still lead to non-use or misuse of the technology.3 libraries should want as many patrons as possible to use the technology to enrich their lives. this includes individuals who have limited technology experience. it is important to provide education and policy to ensure the technology is used properly, such that the technology will not be damaged, and the user will not quit trying to use the technology due to frustration. specific policies for the use of vr technology could be integrated into existing technology policy (if such a policy already exists) or created as a new policy. either way, it should be highly visible, and patrons should be asked to acknowledge it before use. the policy may include elements like that “the patron must follow all library employees’ guidance on how to properly use the vr headset” and “the patron is encouraged to ask any employee for assistance with the headset.” while a library may not be able to foresee or enforce perfect policy for all issues that arise from using emerging technologies like vr, these are some commonsense policy items that protect the user, the library, and the technology while it is in use. though a vastly different “technology” in many ways, the evolution of data analytics in modern libraries similarly poses significant threats to library patrons. as opposed to physical threats to well-being, the threats associated with data analytics are mostly related to social, psychological, and economic well-being mailto:blund@g.emporia.edu information technology and libraries march 2022 policy before technology | lund 2 through privacy and security risks. depending on how data is used, it can be rather innocuous or overtly malicious. it is not always clear when data goes from being innocuous to being a threat.4 collecting patron addresses can seem like necessary and acceptable data in order to issue a library card. libraries could use this data, though, in conjunction with census and other government data, to identify demographics of library users, like the ethnicity of patrons. this could be helpful in knowing, for instance, that a library has a large hispanic patron-base and thus may want to invest in spanish-language resources, but it also involves using data that patrons were forced to supply in order to profile them and make inferences about what materials they would like. understandably, many patrons would likely rather not have private data about them collected and analyzed, even if it could significantly improve services. there is certain data—like the addresses mentioned above—that libraries must collect in order to provide services. this cannot be avoided. rather, what should be done is to have a policy that clearly (without much legal jargon) outlines what data is collected and for what purposes it is used. everyone knows that no one reads lengthy legal disclaimers. while it may be seen as above-board in the eyes of the law, slipping policies that the library knows most patrons would question into a disclaimer is unethical. any questionable policies or procedures should be made clear to patrons, so that they can make an informed decision on whether to opt out of those services. it is great to have extensive data to improve services, but it should not be collected without real consent. no librarian should go home at the end of the day with any question about whether they used proper data collection procedures. additionally, there are always risks with the storage and maintenance of data. how is the data being stored? what security measures have been taken? these questions, along with the concerns in the prior paragraphs, are items that would all have to be addressed in an ethical review application for human subjects research at a university, but could be (and often are) overlooked when it comes to library service s and assessment. this may be particularly true at public libraries, which are not connected to an institution of higher education (which provide some ethical oversight). it is always better to start with a policy than to make one up as one goes along, even if it is necessary to adjust the policy over time as new risks and considerations emerge. for those who are creating a new policy from scratch, one of the best sources of information and inspiration can be the existing policies of other, similar organizations.5 for example, a large public library may look to the data policies of a similarlysituated large public library for inspiration. i encourage additional works by researchers within the field of library technology to strengthen evidence based practice within the area of technology policy formation. it is important to be careful with the design of policy and not to come at it without first doing your homework. yet, at the same time, it is important to consider the unique context of your own institution. what is a successful policy for one library may not be so for another—you must know your service population and specific space and technology infrastructure and management capacities. something like the administrative structure of a library system can significantly impact the success of policy implementation. policy, understandably, can be seen as a boring—if necessary—part of the proper functioning of a library and its technology. this can lead to policy being something that either is created in haste or after considerable procrastination, or something that becomes the subject of unnecessary, prolonged debate among library administration. in most cases, appropriate policy can, in fact, be quite straightforward, if libraries rely upon existing policy examples, understanding of the technology in question, and a thorough assessment of their library environment to guide the policy-drafting process. technology policy can be a boring subject, but its necessity cannot be overstated for reducing liability and threats to the well-being of patrons, library employees, and property. it is important to have technology policy in place before the information technology and libraries march 2022 policy before technology | lund 3 technology is made available to the public so that patrons can make informed decisions about whether to use the technology and/or agree to share data. endnotes 1 matt cook et al., “challenges and strategies for educational virtual reality,” information technology and libraries 38, no. 4 (2019): 25–48, https://doi.org/10.6017/ital.v38i4.11075; kenneth j. varnum, beyond reality: augmented, virtual, and mixed reality in the library, (chicago, il, american library association, 2019). 2 james s. spiegel, “the ethics of virtual reality technology: social hazards and public policy recommendations,” science and engineering ethics 24, no. 5 (2018): 1537–50, https://doi.org/10.1007/s11948-017-9979-y. 3 amy restorick roberts et al., “older adults’ experiences with audiovisual virtual reality: perceived usefulness and other factors influencing technology acceptance," clinical gerontologist 42, no. 1 (2019): 27–33, https://doi.org/10.1080/07317115.2018.1442380. 4 yong jin park, "personal data concern, behavioral puzzle and uncertainty in the age of digital surveillance," telematics and informatics 66 (2022): article 101748, https://doi.org/10.1016/j.tele.2021.101748. 5 lili luo, “experiencing evidence-based library and information practice: academic librarians’ perspective,” college and research libraries 79, no. 4 (2018): 554–67, https://doi.org/10.5860/crl.79.4.554. https://doi.org/10.6017/ital.v38i4.11075 https://doi.org/10.1007/s11948-017-9979-y https://doi.org/10.1080/07317115.2018.1442380 https://doi.org/10.1016/j.tele.2021.101748 https://doi.org/10.5860/crl.79.4.554 endnotes public library computer waiting queues: alternatives to the first -come-first-served strategy stuart williamson public library computer waiting queues | williamson 72 abstract this paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. the consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. introduction many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. from this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. such long waiting times are a common source of frustration for both customers and staff. by far the most effective solution to this problem is to install more public computers at your library. of course, when the space or money run out, this may no longer be possible. another approach is to reduce the length or number of sessions each customer is allowed. unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. a primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. we often take for granted the notion that first-come-first-served (fcfs) is a basic principle of fairness. “i was here first,” is an intuitive claim that we understand from an early age. however, stuart williamson (swilliamson@metrolibrary.org) is researcher, metropolitan library system, oklahoma city, oklahoma. mailto:swilliamson@metrolibrary.org information technology and libraries | june 2012 73 the inefficiency present in a strictly fcfs queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. when express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. these line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. one question addressed by this article is “is there a middle ground?” in other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. strategies queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. the complexity and sheer number of different queuing models can present a formidable barrier to library professionals. this is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. for instance, the poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. with realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. a computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. as is often the case, designing a waiting queue strategy involves striking a balance among competing factors. for instance, one way of reducing waiting times involves breaking with the fcfs rule and allowing users in one category to cut in front of other users. how many cuts are acceptable? does the shorter wait time for users in one category justify the longer waits in another? there are no right answers to these questions. while simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. in addition to the standard fcfs strategy with a single pool of computers and the same fcfs strategy implemented with one computer removed from the pool to serve as a dedicated fifteen public library computer waiting queues | williamson 74 minute express computer (referred to as fcfs-15), we will consider for comparison three other well-known alternative queuing strategies: shortest-job-first (sjf), highest-response-ratio-next (hrrn), and a variant of shortest-job-first (sjf-fb) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 the three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. in our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. each session will then be initially categorized into one of four priority classes (p1, p2, p3, and p4) accordingly. as the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. it should be noted that relying on users to choose their own session length presents its own set of problems. it is often difficult to estimate how much time will be required to accomplish a given set of tasks online. however, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the fcfs-15 system. the trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. however, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. the shortest-job-first (sjf) strategy functions by simply selecting from the queue the user in the highest priority class. the amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. our results demonstrate that the sjf strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. the main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). there is no limit to how many times a user can be passed over in the queue. in theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. the sjf-fb strategy is a variant of sjf with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. for instance, if a user signs up for a sixtyminute session, he/she is initially assigned a priority of 4. suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. the next available computer will be assigned to the user with the priority 2. the bypassed user’s priority will now be bumped up by a set interval. in this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. as a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. this effectively restricts the maximum number of times a user can be cut in front of at six. information technology and libraries | june 2012 75 the final alternative strategy, highest-response-ratio-next (hrrn), is a balance between fcfs and sjf. it considers both the arrival time and requested session length when assigning a priority to each user in the queue. each time a user is selected from the queue, the response ratio is recalculated for all users. the user with the highest response ratio is selected and assigned the open computer. the formula for response ratio is: ( ) this allows users with a shorter session request to cut in line, but only up to a point. even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. this method produces the same benefits and drawbacks as the sjf strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. still, although the expected number of cuts will be lower using hrrn than with sjf, there is no limit on how many times a user may be passed over in the queue. the response ratio formula can be generalized by scaling the importance of the waiting time factor. for instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble fcfs, and decreasing values of 0 < x < 1 will more resemble sjf. ( ) one could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. this won’t be pursued here, and x will be assumed to be 1. methodology the data used in this simulation come from the metropolitan library system’s southern oaks library in oklahoma city. this library has eighteen public internet computers that customers can sign up for using proprietary software developed by jimmy welch, deputy executive director/technology for the metropolitan library system. the waiting queue employs the firstcome-first-served (fcfs) strategy. customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). anonymous usage data gathered by the system in august 2010 was compiled to produce the information about each customer session shown in table 1. public library computer waiting queues | williamson 76 table 1. session data (units in minutes) the information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. once this data has been gathered, the computer simulation runs by iterating through each second the library is open. as user sign-up times are encountered in the data, they are added to the waiting queue. when a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. the customer occupies the computer for the length of time given by their associated log-in delay and session length. when this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. results there were 7,403 sign-ups for the computers at the southern oaks library in august 2010. each of these requests is assigned a priority class based on the length of the session as detailed in table 2. the intended session length of users choosing to abandon the queue is unknown. abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. customers signing up for a subsequent session during the day are always assigned the lowest priority class (p-4) regardless of their requested session length. this is a policy decision to not give priority to users who have already received a computer session for the day. information technology and libraries | june 2012 77 table 2. assignment of priority classes figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). it is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (p1) users. also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the fcfs-15 average wait times in the other priority classes. clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class p-1. losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. figure 1. average user wait minutes by priority class public library computer waiting queues | williamson 78 by contrast, note that the reduced average wait times for the highest priority users in class p-1 persist in classes p-2 and p-3 for the non-fcsc strategies. the sjf strategy produces the most dramatic reductions for the 2,164 users not in class p-4. however, for the 5,239 users in class p-4, the sjf strategy produced an average wait time that was 2.1 minutes longer than the purely fcfs strategy. the hrrn strategy achieves lesser wait time reductions than sjf in the higher priority classes, but hrrn increased the average wait for users in class p-4 by only 0.7 minutes relative to fcfs. the average wait using the sjf-fb strategy falls in between that of sjf and hrrn for each priority class while guaranteeing users will be cut at most six times. an examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. even with a dedicated fifteen-minute express computer under the fcfs-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. in all but the highest priority class (p-2 through p-4), the fcfs-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. figure 2. maximum user wait minutes by priority class capping the number of times a user may be passed over in the queue under the sfj-fb strategy makes it less likely that members of classes p-2 and p-3 will be able to take advantage of their higher priority to cut in front of users in class p-4 during periods of peak demand. as a result, the sjf-fb maximum wait times for classes p-2 and p-3 are similar to those under the fcfs strategy. this was not the case in the breakdown of sjf-fb average waiting times across priority classes in figure 1. information technology and libraries | june 2012 79 table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. here we see the effects of each strategy on the system as a whole, instead of by priority class. notice that the overall average wait times for the non-fcfs strategies are lower than those of fcfs. this indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class p-4. in other words, these strategies are globally more efficient than fcfs. notice, too, in table 3 that the non-fcfs strategies achieve significant reductions in the median wait time compared with fcfs. table 3. distribution of wait times by strategy after demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. line cuts are recorded by each user in the simulation while waiting in the queue. each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. by the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. public library computer waiting queues | williamson 80 figure 3. cumulative distribution of line cuts by queuing strategy figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-fcfs strategy. the majority of users are not passed over at all under these strategies. however, there is a small minority of users that will be repeatedly cut in line. for instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the sjf strategy. this user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar fcfs waiting queue. most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. the hrrn strategy caused a maximum of nine cuts to an individual in this simulation. this user waited seventy-three minutes under hrrn versus only fifty-five minutes using fcfs. extreme examples such as those above are the exception. under the hrrn and sjf-fb strategies, 99% of users were passed over at most four times while waiting in the queue. conclusion we have examined the simulation of several queuing strategies using a single month of computer usage data from the southern oaks library. the relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. however, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. information technology and libraries | june 2012 81 in general, however, these simulation results demonstrate the ability of non-fcfs queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. these reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. this causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. of course, for some customers, being passed over in line even once is intolerable. furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. however, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-fcfs queuing strategies discussed in this paper may be a viable alternative. at the very least, the fcfs-15 simulation results should give one pause before resorting to designated “express” and “nonexpress” computers in an attempt to remedy unacceptable customer waiting times. acknowledgments the author would like to thank the metropolitan library system, kay bauman, jimmy welch, sudarshan dhall, and bo kinney for their support and assistance with this paper as well as tracey thompson and tim spindle for their excellent review and recommendations. references 1. j. d. slone, “the impact of time constraints on internet and web use,” journal of the american society for information science and technology 58 (2007): 508–17. 2. william mendenhall and terry sincich, statistics for engineering and the sciences (upper saddle river, nj: prentice-hall, 2006), 151–54. 3. abraham silberschatz, peter baer galvin, and greg gagne, operating system concepts (hoboken, nj: wiley, 2009), 188–200. incoming editor’s column | gerrity 155 bob gerrity g reetings ital readers. i’m writing this in late september, as the boston red sox attempt to back their way into the major league baseball postseason after blowing a 9-game lead over tampa bay in a major-league september meltdown of epic proportions. [red sox fans are prone to hyperbole, but in this case no hyperbole is needed: this meltdown really is epic.] it’s down to the last game of the season, and like many red sox fans, i’m hopeful but not optimistic. the fate of the 2011 red sox will be old news by the time this appears in print, though: as i’m coming to learn, the wheels of scholarly publishing continue to turn ever so slowly, unless forced to do otherwise. which brings me to why i’m taking on the role of editor of ital. on one hand, i’m fortunate to be taking on the editorship of a journal that quite clearly has been stewarded with care, dedication, and attention by my predecessors. i’ve spent quite a few hours recently in the z678.9 section of my library’s stacks, perusing three decades of back volumes of ital and its predecessor, the journal of library automation. there’s an impressive body of scholarly and informational output on library automation and related topics, from the sublime (“to boolean or not to boolean?” september 1983), to the not-so-sublime (“the effects of baud rate, performance anxiety, and experience in online bibliographic searches,” march 1990), to the sentimental (“floppies to pass the billion-dollar level in ’84.” september 1982), to the déjà-vu-all-over-again (“ls2000—the integrated library system for oclc,” june 1984). overall, i’d have to say there’s a solid foundation to build on, plus plenty of good content in the pipeline, and it would be easy to continue on in the same vein. but that’s not why i’m here. i’m fortunate to be taking on the role of editor as ital faces significant changes. in his inaugural editorial for ital in march 2005, then-incoming editor john webb articulated a number of worthy goals for ital, to both broaden and deepen the content of the journal and the demographic of the authors contributing to it. one goal in particular, though, strikes me (in hindsight of course) as problematic: “i hope to . . . facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association.” anyone who has observed the struggles of the newspaper industry in recent years or been involved in the shift towards e-only in the world of academic/scholarly journals will not be surprised to learn that, in the intervening years since john wrote his column and ital has continued in print plus electronic form, revenues (primarily from subscriptions and advertising) have steadily declined while production and distribution costs have not, resulting in an increasing annual subsidy from ala/lita to support the publication. as a result, i’ve been tasked with exploring a new publication model for ital: open access and electronic only. plans for—and the timing of — this transition are still being developed as i write this, but should be finalized before “my” first issue is published in march 2012. there is much about ital that will not change even if the publication format does. a primary focus of the journal will continue to be to solicit and publish high-quality, peer-reviewed papers covering a broad array of topics related to the design, application, and use of technology in libraries. changes i would like to see include making ital more timely and more relevant to the day-to-day work interests of many of its readers. i’d like to add more topical, current, and informational content to ital without negatively impacting its traditional role as a publication vehicle for librarians in tenure-track positions. ital in an e-only format also needs to provide easy and transparent ways for readers to be informed when new content is published and to offer advice, criticism, and commentary to help improve ital. i look forward to your feedback as ital moves in a new direction, about which i’m both hopeful and optimistic. i would like to offer my sincere thanks to the outgoing editor of ital, marc truitt, who has been both helpful and gracious during this editorial transition. marc is passionate about ital and its legacy, and i hope he’ll see the future ital as a worthy successor to, rather than an unfortunate break from, the journal he’s stewarded for the past several years. incoming editor’s column: ch-ch-ch-ch-changes (turn and face the strain) bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. the first 500 mistakes you will make while streaming on twitch.tv public libraries leading the way the first 500 mistakes you will make while streaming on twitch.tv chris markman, kasper kimura, and molly wallner information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15475 chris markman (chris.markman@cityofpaloalto.org) is senior librarian, palo alto city library. kasper kimura (kasper.tsutomu@gmail.com) is methodist youth fellowship high school director, wesley united methodist church. molly wallner (molly.wallner@cityofpaloalto.org) is senior librarian, palo alto city library. © 2022. introduction three librarians at the palo alto city library embarked on an epic virtual event journey in 2020. this is our story. twitch.tv is the most popular video game streaming platform on the internet right now, but that does not mean it is the easiest to use or navigate as content creators. while the mistakes were many, you do not have to repeat them. in short, lessons learned over the past two years fell under four distinct categories, many of them interrelated or compounding one another: • physical space limitations and challenges migrating studio setups during various phases of the covid-19 pandemic. • complex decisions making audio and video equipment purchases. • our own familiarity with videogame streaming platforms and specialized software. • converting our in-person event policies and codes of conduct for virtual events. mistakes 001–135: picking the right time, place, and software we can say confidently that mistake #1 in your 500-mistake journey is pretending the library will strike gold with its first ever stream and achieve instant online success. we chose minecraft as our first videogame featured on twitch.tv. the cold reality is that real-world streamers who host thousands of viewers at one time are not building the interpersonal connections you are likely aiming for as a librarian. the second biggest mistake you’re likely to make while setting up a stream is in picking the right location. over the course of two years, in response to different levels of building access, we ended up moving our ad-hoc studio location a total of four times. each location posed its own challenges, and we learned more about what worked with every move. your streaming space should not only be distraction free, but also easy to adjust as needed, because your setup will change over time. picking the right av equipment for your stream is a gigantic topic, and the subject of infinite support forum threads and online discussions. the correct answer also largely depends on if you plan to stick with console game streaming, or pc, or some mixture of both. we can summarize by saying that to start off, you do not need the very best studio gear, and in fact , this thinking can lead to an artificial barrier that might result in more “tech debt” than necessary. you will end up spending a considerable amount of time troubleshooting strange quirks that were not there the last time you streamed, or with each new equipment purchase/upgrade. mailto:chris.markman@cityofpaloalto.org mailto:kasper.tsutomu@gmail.com mailto:molly.wallner@cityofpaloalto.org information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 2 mistakes 136–223: moderation tools and volunteers we have had to block a few bots, as well as tactfully defuse some loose-cannon streamsurfers by maintaining aggressive kindness in answer to their sarcastic questioning. overall, our moderating world has not been rocked in a way we weren’t prepared for, due to our thoughtfully crafted and transparent policy that was adapted from our patron code of conduct, trained teen volunteer moderators, and clear communication as a team. mistakes 224–301: the finer points of twitch.tv in addition to having had little experience playing video games in general, our stream host also had no experience with streaming. by design, kasper went into our first stream with only two guidelines for interacting with twitch viewers: don’t stop talking and be friendly. no one wants to watch someone silently play a game badly; it’s not engaging and it’s not fun. another part of using twitch that we did not account for until we were in the middle of the first stream is that the chat runs on a delay. this makes sense from a moderating point of view; you want to be able to catch inappropriate or spam comments. however, in terms of holding a conversation with the chat, it became a mental challenge to hold multiple threads of conversation at a time—all while playing—and all while narrating what’s happening on screen, and as people were typing to respond to what was just said or done. this process can be very overwhelming for twitch.tv hosts.1 imagine driving a car on the highway while also watching a movie of yourself, and then simultaneously holding a conversation with ten or more people in the back seat of this car at the same time. they’re not commenting on what you’re currently doing though, instead they’re making jokes about the on-ramp or stop light two miles back. it’s not impossible to juggle these tasks simultaneously, but as the host, it does require practice. mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything heed our warning! you can find a mountain of well-meaning online advice and tutorials about the best possible streaming setup and content strategy: much of this is outdated or aimed at a very specific subset of gamers. there is a cottage industry of media consultants and youtuber personalities that review hardware and share tips and tricks advice. your information literacy skills should not go to waste here! always consider the source. stream decks and keyboard shortcuts: what the twitch.tv pros get right if we could go back in time, there is one element to our stream setup that could have been integrated sooner, and that’s the stream deck by elgato (https://www.elgato.com/en/streamdeck). this extra desktop keypad is literally a game changer for usability—it is the peanut butter that smooths over all the ux cracks created by open broadcast studio (obs) and the chaos of chat interactions already discussed. this small hardware upgrade also makes onboarding new stream hosts much easier because there is no need to memorize keyboard shortcuts: the buttons on the stream deck can be customized to do exactly what they say they do (like mute audio, change screen layouts, or stop and start streaming). mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? we used social media to connect with other organizations doing similar work, such as the lgbtq+ youth space in san jose. we had worked with this group before the pandemic on some https://library.cityofpaloalto.org/library-policies/patron-code-of-conduct/ https://www.elgato.com/en/stream-deck https://www.elgato.com/en/stream-deck https://youthspace.org/ information technology and libraries september 2022 the first 500 mistakes you will make while streaming on twitch.tv | markman, kimura, and wallner 3 pride programs for teens at the library, and so in 2020, when we saw on their instagram that they had a minecraft server open to the local community, our team eagerly jumped on this opportunity to collaborate with them. we had a minecraft stream; they had a minecraft server—could the stars be any more aligned? after some planning, one of the server mods joined us for a stream and gave us a tour of their server, which ended up being one of our most popular streams to date. conclusion: and what did we learn from all this? the final mistake (#500) is giving up. over the past two years we have hosted over 50 streams at https://www.twitch.tv/paloaltolibrary and can say confidently that not only was each virtual event unique, but also improved over time. we encourage more librarians to test out this mode of online outreach and practice your iterative design skills. video game streaming is not only fun for both the audience and hosts, but also a great way to connect with “extremely online” patrons of all ages. endnotes 1 to illustrate this problem in more detail: consider the events of our first very first stream, in which kasper’s dog saw a postal employee through the window while live on camera and reacted accordingly. this was one of the many reasons why moving our center of operations from the living room to the library was an upgrade. https://www.twitch.tv/paloaltolibrary introduction mistakes 001–135: picking the right time, place, and software mistakes 136–223: moderation tools and volunteers mistakes 224–301: the finer points of twitch.tv mistakes 302–389: art is a process, just like the inevitable bugs you will find in your setup every time you change anything stream decks and keyboard shortcuts: what the twitch.tv pros get right mistakes 390–499: do androids dream of electric animal crossing dream codes? what does twitch.tv outreach look like? conclusion: and what did we learn from all this? endnotes : | wang 81building an open source institutional repository at a small law school library | wang 81 fang wangcommunications v700 flatbed scanner, which was recommended by many digitization best practices in texas. for software, we had all the important basics such as ocr and image editing software for the project to start. for the following several months, i did extensive research on what digital asset management platform would be the best solution for the law library. we had options to continue displaying the digital collections through webpages or use a digital asset management platform that would provide long-term preservation as well as retrieval functions. we made the decision to go with the latter. generally speaking, there are two types of digital asset management platforms: proprietary and open source. in some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has designated programmers. there are pros and cons to both proprietary and open source platforms. although setting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the service. for the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time consuming and these solutions often lack technical and development support. there is no uniform rule for choosing a platform. it depends on what the organization wants to achieve and its own unique circumstances. i explored several popular proprietary platforms such as contentdm and digital commons. contentdm is an oclc product, which has a lot of capability and is especially good for displaying image collections. digital commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledgebase of ir development. institutional repository from the ground up unlike most large university libraries, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. although institutional repositories have already become a trend for large university libraries, it still appears to be a new concept for many law school libraries. at the beginning of 2009, i was hired as the digital information management librarian to develop a digital repository for the law school library. when i arrived at texas tech university law library, there was no institutional repository implemented. there were very few digital projects done at the law library. one digital collection was of faculty scholarship. this collection was displayed on a webpage with links to pdf files. another digital project, to digitize and provide access to the texas governor executive orders found in the texas register, was planned then disbanded because of the previous employee leaving the position. i started by looking at the digitization equipment in the library. the equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. the good thing was that the library did have extra pcs to serve as workstations. i did research on the book scanner we had and also consulted colleagues i met at various digital library conferences about it. because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, i decided to get rid of the scanner. i then proposed to purchase an epson perfection building an open source institutional repository at a small law school library: is it realistic or unattainable? digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. most law school libraries that have already implemented an institutional repository (ir) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. the texas tech university school of law digital repository is one of the few law school repositories in the nation that is built on the dspace open source platform.1 the repository is the law school’s first institutional repository in history. it was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. in addition, the repository also serves as a dark archive to house internal records. i n this article, the author describes the process of building the digital repository from scratch including hardware and software, customization, collection development, marketing and outreach, and future projects. although the development fang wang (fang.wang@ttu.edu) is digital information management librarian, texas tech university school of law library, lubbock, texas. 82 information technology and libraries | june 2011 two months later, we discovered that a preconfigured application called jumpbox for dspace was released and approved to be a much easier solution for the installation. the price was reasonable too, $149 a year (the price has jumped quite a bit since then). however, using jumpbox would leave our newly purchased red hat linux server of no use because jumpbox runs on ubuntu, therefore after some discussion we decided not to pursue it. we were a little stuck in the installation process. outsourcing the installation seemed to be a feasible solution for us at this point. we identified a reputable dspace service provider after doing extensive research including comparing vendors, obtaining references, and pursuing other avenues. after obtaining a quote, we were quite satisfied with the price and decided to contract with the vendor. while waiting for the contract to be approved by the university contracting office, i began designing the look and feel that is unique to the ttu school of law with some help from another library staff member. the installation finally took place at the beginning of january 2010. i worked very closely with the service provider during the installation to ensure the desired configuration for our dspace instance. our repository site with the ttu law branding became accessible to the public three days later. and with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. overall, we are very pleased with the results. after the installation, our it department maintains the dspace site and we host all the content on our own server. collection development of the ir content is the most critical element to an institutional repository. while we were waiting for our it department 66, the majority of the repositories worldwide were created using the dspace platform.2 for the installation, we looked at the opportunity to use services provided by the state digital library consortium texas digital library (tdl) and tried to pursue a partnership with the main university library, which had already implemented a digital repository. however, because of financial reasons and separate budgets, those approaches did not work out. so we decided to have our own it department install dspace. installation and customization of our dspace unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. after making the decision to use dspace, the first challenge we faced was the installation. dspace runs on postgresql or oracle and requires a server installation. customizing the web interface requires either the jspui (javaserver pages user interface) or xmlui (extensible markup language user interface). the staff in our it department knew little about dspace. however, another special library on campus offered their installation notes to our system administrator because they just installed dspace. although dspace runs on a variety of operating systems, we purchased red hat enterprise linux after some testing because it is the recommended os for dspace. then our system administrator spent several months trying to figure out how to install the software in addition to his existing projects. because we did not have dedicated it personnel working on the installation, the work was often interrupted and very difficult to complete. our it staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by berkley press and is often used in the law library community. as a smaller law library, our budget did not allow us to purchase those platforms, which require annual fees of more than $10,000. so we had to look at the open source options. for the open source platforms, i investigated dspace, fedora, eprints and green stone. dspace is a javabased system developed by mit and hp labs. it offers a communitiescollections model and has built-in submission workflows and long-term preservation function. it can be installed “out of the box” and is easy to use. it has been widely adopted as institutional repository software in the united states and worldwide. fedora was also developed in the united states. it is more of a backend software with no web-based administration tools and requires a lot of programming effort. similar to dspace, eprints is another easy to set up and use ir software developed in the u.k. it is written in perl and is more widespread in europe. greenstone is a tool developed in new zealand for building and distributing digital library collections. it provides interfaces in 35 languages so it has many international users. when choosing an ir platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. our goal was to find a platform that had low costs and did not involve much programming. we also wanted a system that was capable of archiving digital items in various formats for the long term, flexible for data migration, had a widely accepted metadata scheme, decent search capability, and was easy to use. another factor we had to consider was the user base. because open source software relies on the user themselves for technical support for the most part, we wanted a software that had an active user community in the united states. dspace seemed to satisfy all of our needs. also, according to repository : | wang 83building an open source institutional repository at a small law school library | wang 83 hosted by the lubbock county bar association at the ttu law school. we made the initial announcement to the law faculty and staff and later to the lubbock county bar about the new digital initiative service we have established. we received very positive feedback from the law community. professor edgar’s family was delighted to see his collection made available to the public. following the success of the initial launch, i developed an outreach plan to promote the digital repository. to make the repository site more visible, several efforts were made: the repository site url was submitted to the dspace user registry, the directory of open access repositories (opendoar), and registry of open access repositories (roar); the site was registered with google webmaster tools for better indexing; and the repository was linked to several websites of the law school and library. the “faculty scholarship” collection and the “texas governor executive orders” collection became available shortly after. i then developed a poster of the newly established digital repository and presented it at the texas conference on digital libraries held at university of texas austin in may 2010. currently, our digital repository has more than eight hundred digital items as of august 2010. with more and more content becoming available in the repository, we plan on making an official announcement to the law community. we will also make entering first-year law students aware of the ir by including an article about the new repository in the library newsletter that is distributed to them during their orientation. our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system techlawannounce and promoting the digital repository through the law library social networking pages on facebook and twitter. we also plan reviewed each year. based on the collection development policy, we made a decision to migrate the content of the old “faculty scholarship” collection from webpages into the digital repository. it was intended to include all publications of the texas tech law school faculty in the collection. we then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and ocr-ing pdf files; uploading files to dspace; and creating basic metadata. we also brought another two student assistants on board to help with the migration of the faculty scholarship collection. the faculty services librarian checked the copyright with faculty members and publishers while i (the digital information management librarian) served as the repository manager handling more complicated metadata creation, performing quality control over student submissions, and overseeing the whole project. later development and promoting the ir during the faculty scholarship migration process, we discovered a need to customize dspace to allow active urls for publications. we wanted all the articles linked to three widely used legal databases: westlaw, lexisnexis, and hein online. because the default dspace system does not support active urls, it requires some programming effort to make the system detect a particular metadata field then render it as a clickable link. we outsourced the development to the same service provider who installed dspace for us. the results were very satisfying. the vendor customized the system to allow active urls and displayed the links as clickable icons for each legal database. in april 2010, “professor j. hadley edgar ’s personal papers” collection was made available in conjunction with his memorial service, to install dspace, we prepared and scanned two collections: the “texas governor executive orders” collection and the “professor j. hadley edgar’s personal papers” collection. the latter was a collection donated by professor edgar’s wife after he passed away in 2009. professor edgar taught at the law school from 1971 to 1991. he was named the robert h. bean professor of law and was twice voted by the student body as the outstanding law professor. the collection contains personal correspondence, photos, newspaper clippings, certificates, and other materials. many of the items have a high historic value to the law school. for the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. we chose pdf as our production file format as it is a common document format and smaller in size to download. after the installation was completed at the beginning of january, i drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. the policy includes elements such as the purpose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. i also developed a repository release form to obtain permissions from donors and authors to ensure open access for the materials in the repository. twelve collections were initially planned for the repository: “faculty scholarship,” “personal manuscripts,” “texas governor executive orders,” “law school history,” “law library history,” “regional legal history,” “law student works,” “audio/ video collection,” “dark archive,” “electronic journals,” “conference, colloquium and symposium,” and “lectures and presentations.” there will be changes to the collections in the future as the digital repository collection development policy will be 84 information technology and libraries | june 2011 all roads lead to rome. no matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. to build a successful institutional repository is not simply “scanning” and “putting stuff online.” various factors need to be considered, such as digitization, ir platform, collection development, metadata, copyright issues, and marketing and outreach. our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source ir such as dspace and continue to maintain the site and build the collections with success. open source software is certainly not “free” because it requires a lot of effort. however, in the end it still costs a lot less than what we would pay to the proprietary software vendors. references 1. “the texas tech university school of law digital repository,” http://reposi tory.law.ttu.edu/ (accessed apr. 5, 2011). 2. “repository maps,” accessed http://maps.repository66.org/ (accessed aug. 16, 2010). (ssrn) links to individual articles in the faculty scholarship collection. after that, the next collections we will work on are the law school and law library history materials. we also plan to do some development on the dspace authentication to integrate with the ttu “eraider” system to enable single log-in. in the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. conclusion it is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. installation and development are certainly a big challenge for a smaller library with limited number of it staff. outsourcing these needs to a service provider seems to be a feasible solution. another challenge is training. we overcame this challenge by taking advantage of the state consortium’s dspace training sessions. subscribing to the dspace mailing list is necessary as it is a communication channel for dspace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. future projects there is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. one of our law faculty, professor daniel benson, donated some of his personal files from an eight-year litigation representing the minority plaintiffs in the civil rights case of jones v. city of lubbock, 727 f. 2d 364 (5th cir. 1984) in which the minority plaintiffs won the case. the lawsuit changed the city of lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. this collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. it has significant historical value because a texas tech law professor and five texas tech law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. in addition, we plan on adding social science research network editorial board thoughts: appreciation for history cynthia porter information technology and libraries | september 2012 2 the future looks exciting for ital, with our new open access and online only journal. as i look forward, i have been thinking about librarians and the changes i have witnessed in library technology. i would like to thank judith carter for her work on ital for over 13 years. she encouraged me to volunteer for the editorial board. i will miss her. i believe that lessons from the past can help us. ital’s first issue appeared in 1982—the same year that i graduated from high school. i typed all my school papers with a typewriter except for my last couple of papers in college. my father bought an early macintosh computer (called lisa). he had a daisy wheel printer—if we wanted to change fonts, we changed out the daisy wheel. i am thankful for the editing capabilities and font choices i have now when i create documents. as an undergraduate student, i worked on dedicated oclc terminals in the interlibrary loan (ill) department at my college library. i was hired because i had the two hours open when ill usually used mail. i thought our ill service was a big help for our students. i could not imagine then that electronic copies of articles could be delivered to ill customers within one day. today’s ill staff doesn’t have to worry about paper cuts now, either. i graduated from library school in 1989. when i first started working as a cataloger, we were able to access oclc on pc’s (an improvement from the dumb terminals) in the libraries. our subject heading lists were in the big red books from the library of congress. i tried to use the red books as an example for today’s students and they had no idea what i was talking about. even though “subject headings” are a foreign concept to many students today, i will always value them and fight for their continuation. i worked on several retrospective conversion projects when i worked for a library contractor until 1991. the libraries still had card catalogs and we converted these physical catalogs to online catalogs. nicholson baker’s article “discards1,” published in 1994, fondly remembered card catalogs. this article was discussed fervently in library school, but it seems quaint now. i grew up with card catalogs and i liked being able to browse through the subject listings. browsing online does not provide the same satisfaction, but i would never give up the ability to keyword search an electronic document. i liked browsing the classification schemes, too. i did like easily seeing where your chosen number appeared within the scheme. it’s harder to do the same thing online. in 1991 i worked at an academic library where we were still converting catalog cards. we all had cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. editorial board thoughts: appreciation for history| porter 3 computers on our desks by then and we were comfortable with regular use of e-mail. the internet was still young and gophers were the new technology. even though gophers were text-based, i thought it was amazing how easy it was to access information from a university on the other side of the country. the internet was the biggest technology development for me. i currently work with distance students who rely on their internet connections to use our online library. i could not imagine even having distance students if we weren’t connected with computers as we are now. a 2009 issue of ital was dedicated to discovery tools. in judith carter’s introduction to the issue she cites the browsing theory of shan-ju lin chang. browsing is an old practice in libraries and i am very happy to see that discovery tools use this classic library practice. bringing like items together has been a helpful organization method for ages. when i studied s.r. ranganathan and his colon classification scheme, i realized that faceted classification would work very well on the web. i found his ideas to be fascinating, but difficult to implement on book labels for classification numbers. some discovery tools even identify “facets” in searching and limiting. ranganathan’s work is a beautiful example of an old idea blossoming years after its conception. classification, facets, and browsing are old ideas that are still helping us organize information in our libraries. we can’t see the heavily used subjects by how dirty the cards are, but getting exact statistics on search terms is more useful anyway. i would also like to thank marc truitt for his time and contributions to ital. marc recently finished serving for four years as ital editor. he helped me remember library technology. i wanted to know about his collaboration with judith carter. he said that he “thought no one this side of pluto could do as well as she” as managing editor. we are lucky to have had brave librarians like ranganathan, carter, and truitt. although i enjoy remembering the past, i am very happy to utilize modern technology in my library. i don’t want to live in the past, but i definitely don’t want to forget it either. thank you library technology pioneers. references 1. nicholson baker, “discards,” the new yorker, april 4, 1994, vol. 70, no. 7, p. 64-85. 40 information technology and libraries | march 2010 mary kurtz dublin core, dspace, and a brief analysis of three university repositories this paper provides an overview of dublin core (dc) and dspace together with an examination of the institutional repositories of three public research universities. the universities all use dc and dspace to create and manage their repositories. i drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. i also examined the quality of records with reference to the methods of educating repository users. one repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the selfarchiving process. the librarian-overseen archive had the most complete and accurate records for dspace entries. t he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. digital text and images, audio and video files, spreadsheets, websites, interactive databases, rss feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the web and elsewhere. these new dataforms do not always conform to conventional cataloging formats. in an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. metadata is, according to ala, “structured, encoded data that describe characteristics of informationbearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 metadata is an attempt to capture the contextual information surrounding a datum. the enriching contextual information assists the data user to understand how to use the original datum. metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n dublin core dublin core (dc) is a metadata schema that arose from an invitational workshop sponsored by the online computer library center (oclc) in 1995. “dublin” refers to the location of this original meeting in dublin, ohio, and “core” refers to that fact dc is set of metadata elements that are basic, but expandable. dc draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. the standards and definitions of the dc element sets have been developed and refined by the dublin core metadata initiative (dcmi) with an eye to interoperability. dcmi maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the dc elements and their properties. dc is a set of fifteen basic elements plus three additional elements. all elements are both optional and repeatable. the basic dc elements are: 1. title 2. creator 3. subject 4. description 5. publisher 6. contributor 7. date 8. type 9. format 10. identifier 11. source 12. language 13. relation 14. coverage 15. rights the additional dc elements are: 16. audience 17. provenance 18. rights holder dc allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. the use of these refinements is not required. dc also allows for the addition of nonstandard elements for local use. n dspace dspace is an open-source software package that provides management tools for digital assets. it is frequently used to create and manage institutional repositories. first released in 2002, dspace is a joint development effort of hewlett packard (hp) labs and the massachusetts institute of technology (mit). today, dspace’s future mary kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel university’s school of information technology. she also holds a bs in secondary education from the university of scranton and an ma in english from the university of illinois at urbana– champaign. currently, kurtz volunteers her time in technical services/cataloging at simms library at albuquerque academy and in corporate archives at lovelace respiratory research institute (www.lrri.org), where she is using dspace to manage a diverse collection of historical photographs and scientific publications. dc, dspace, and a brief analysis of three university repositories | kurtz 41 is guided by a loose grouping of interested developers called the dspace committers group, whose members currently include hp labs, mit, oclc, the university of cambridge, the university of edinburgh, the australian national university, and texas a&m university. dspace version 1.3 was released in 2005 and the newest version, dspace 1.5, was released in march 2008. more than one thousand institutions around the world use dspace, including public and private colleges and universities and a variety not-for-profit corporations. dc is at the heart of dspace. although dspace can be customized to a limited extent, the basic and qualified elements of dc and their refinements form dspace’s backbone.2 n how dspace works: a contributor’s perspective dspace is designed for use by “metadata naive” contributors. this is a conscious design choice made by its developers and in keeping with the philosophy of inclusion for institutional repositories. dspace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. dspace simplifies the metadata markup process by using terminology that is different from dc standards and by automating the production of element fields and xml/html code. dspace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. the user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. the contributor is an individual who wishes to add their own work to the database. to become a contributor, one must be approved by a dspace community administrator and receive a password. a contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. their editing and removal privileges are restricted to their own records. a community administrator has oversight within their specialized area of dspace and accordingly has more privileges within the system than a contributor. a community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. lastly, the community administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. the network/systems administrator is not involved with database content, focusing rather on software maintenance and code customization. when a dspace contributor wishes to create a new record, the software walks them through the process. dspace presents seven screens in sequence that ask for specific information to be entered via check buttons, fillin textboxes, and sliders. at the end of this process, the contributor must electronically sign an acceptance of the statement of rights. because dspace’s software attempts to simplify the metadata-creation process for contributors, its terminology is different from dc’s. dspace uses more common terms that are familiar to a wider variety of individuals. for example, dspace asks the contributor to list an “author” for the work, not a “creator” or a “contributor.” in fact, those terms appear nowhere in any dspace. instead, dspace takes the text entered in the author textbox and maps it to a dc element—something that has profound implications if the mapping does not follow expected dc definitions. likewise, dspace does not use “subject” when asking the contributor to describe their material. instead, dspace asks the contributor to list keywords. text entered into the keyword field is then mapped into the subject element. while this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. dc’s metadata elements are all optional. this is not true in dspace. dspace has both mandatory and automatic elements in its records. because of this, data records created in dspace look different than data records created in dc. these mandatory, automatic, and default fields affect the fill frequency of certain dc elements—with all of these elements having 100 percent participation. in dspace, the title element is mandatory; that is, it is a required element. the software will not allow the contributor to proceed if the title text box is left empty. as a consequence, all dspace records will have 100 percent participation in the title element. dspace has seven automatic elements, that is, element fields that are created by the software without any need for contributor input. three are date elements, two are format elements, one is an identifier, and one is provenance. dspace automatically records the time of the each record’s creation in machine-readable form. when the record is uploaded into the database, this timestamp is entered into three element fields: dc.date.available, dc.date.accessioned, and dc.date.issued. therefore dspace records have 100 percent participation in the date element. for previously published materials, a separate screen asks for the original publication date, which is then 42 information technology and libraries | march 2010 placed in the dc.date.issued element. like title, the original date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. in a similar manner, dspace “reads” the kind of file the contributor is uploading to the database. dspace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. this data is automatically entered into dc.format.mimetype and dc.format.extent. like date, all dspace records will have 100 percent participation in the format element. likewise, dspace automatically assigns a location identifier when a record is uploaded to the database. this information is recorded as an uri and placed in the identifier element. all dspace records have a dc.identifier.uri field. the final automatic element is provenance. at the time of record creation, dspace records the identity of the contributor (derived from the sign-in identity and password) and places this information into a dc.provenance element field. this information becomes a permanent part of the dspace record; however, this field is a hidden to users. typically only community and network/systems administrators may view provenance information. still, like date, format, and identifier elements, dspace records have automatic 100 percent participation in provenance. because of the design of dspace’s software, all dspace-created records will have a combination of both contributor-created and dspace-created metadata. all dspace records can be edited. during record creation, the contributor may at any time move backward through his record to alter information. once the record has been finished and the statement of rights signed, the completed record moves into the community administrator’s workflow. once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using dspace’s editing tools. however, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. a record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. in editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. calling up the editing tools at this point allows the contributor or administrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. when using the editing tools, the simplified contributor interface disappears, and the metadata elements fields are labeled with their dc names. the contributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. for example, during the editing process, the contributor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. in the examination of the dspace records from our three repositories, dspace’s shaping influence on element participation and metadata quality will be clearly seen. n the repositories dspace is principally used by academic and corporate nonprofit agencies to create and manage their institutional repositories. for this study, i selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. the university of new mexico (unm) dspace repository (dspaceunm) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the law school, the anderson school of business administration, and the medical school, as well as materials from a number of tangentially related university entities like the western water policy review advisory commission, new mexico water trust board, and governor richardson’s task force on ethic reform. at the time of the initial research for this paper (spring 2008), dspaceunm provided little easily accessible on-site education for contributors about the dspace record-creation process. what was offered—a set of eight general information files—was buried deep inside the library community. a contributor would have to know the files existed to find them. by summer 2009, this had changed. dspaceunm had a new homepage layout. there is now a link to “help sheets and promotional materials” at the top center of the homepage. this link leads to the previously difficult-tofind help files. the content of the help files, however, remains largely unchanged. they discuss community creation, copyrights, administrative workflow for community creation, a list of supported formats, a statement of dspaceunm’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. for the most part, dspaceunm help sheets do not attempt to educate the contributor in issues of metadata quality. there is no discussion of dc terminology, no attempts to refer the contributor to a thesaurus or controlled vocabulary list, nor any explanation of the record-creation or editing process. this lack of contributor education may be explained in part because dspaceunm requires all new records dc, dspace, and a brief analysis of three university repositories | kurtz 43 to be reviewed by a subject area librarian as part of the dspace community workflow. thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. the university of washington (uw) dspace repository (researchworks at the university of washington) hosts a narrower set of records than dspaceunm, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the uw’s archives and uw’s school of public and community health. in 2008, researchworks was self-archiving. most contributors were expected to use dspace to create and upload their record. there is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. the help link on the researchworks homepage brought contributors to a set of screen-by-screen instructions on how to use dspace’s software to create and upload a record. the step-through did not include instructions on how to edit a record once it had been created. no explanation of the meanings or definitions of the various dc elements was included in the help files. there also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. by 2009, this link had disappeared and the associated contributor education materials with it. the knowledge bank at ohio state university(osu) is the third repository examined for this paper. osu’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. like researchworks at uw, osu’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. osu makes a strong effort to educate its contributors. on the upper-left of the knowledge bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: about knowledge bank, faqs, policies, video upload procedures, community set-up form, describing your resources, and knowledge bank licensing agreement. the existence and use of metadata in knowledge bank are explicitly mentioned in the faq and policies areas, together with an explanation of what metadata is and how metadata is used (faq), and a list of supported metadata elements (policies). the describe your resources section gives extended definitions of each dspace-available dc metadata element and provides examples of appropriate metadata-element use. knowledge bank provides the most comprehensive contributor education information of any of the three repositories examined. it does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n data and analysis i chose twenty randomly selected full records from each repository. no more than one record was taken from any one collection to gather a broad sampling from each repository. i examined each record for the quality of its metadata. metadata quality is a semantically slippery term. park, in the spring 2009 special metadata issue of cataloging and classification quarterly, suggested that most commonly accepted criteria for metadata quality are completeness, accuracy, and consistence.3 those criteria will be applied in this analysis. for the purpose of this paper, i define completeness as the fill rate for key metadata elements. because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. i chose these elements because these are the fields that the dspace software uses when someone conducts an unrestricted search. table 1 shows the fill rate for the title element is 100 percent for all three repositories. this is to be expected because, as noted above, title is mandatory field. the fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for unm, 19 of 20 (95 percent) for uw, and 19 of 20 (95 percent) for osu. (osu’s fill rate for creator and contributor were summed because osu uses different definitions for creator and contributor element fields than do unm or uw. this discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) the fill rate for subject was more variable. unm’s subject fill rate was 100 percent, while uw’s was 55 percent, and osu’s was 40 percent. the fill rate for the description.abstract subfield was 12 of 80 (60 percent) at unm, 15 of 20 (75 percent) at uw, and 8 of 20 (40 percent) at osu. (see appendix a for a complete list of metadata elements and subfields used by each of the three repositories.) the relatively low fill rate (below 50 percent) at the osu knowledgebank in both subject and description .abstract suggests a lack of completeness in that repository’s records. accuracy in metadata quality is the essential “correctness” of a record. correctness issues in a record range from data-entry issues (typos, misspellings, and inconsistent date formats) to the correct application of metadata definitions and data overlaps.4 accuracy is perhaps the most difficult of the metadata 44 information technology and libraries | march 2010 quality criteria to judge. local practices vary widely, and dc allows for the creation of custom metadata tags for local use. additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 because of this, only the most egregious of accuracy errors were considered for this paper. all three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. inaccurate records included a wide variety of accuracy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. additionally, records showed confusion over contributor versus creator elements. in a few records, contributors entered duplicate information into both element fields. this observation supports park and childress’s findings that there is widespread confusion over these elements.6 among the most problematic records in terms of accuracy were those contained in uw’s early buddhist manuscripts project. this collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 while contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocabulary used to describe records. no community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. consistency can be defined as the homogeneity of formats, definitions, and use of dc elements within the records. this consistency, or uniformity, of data is important because it promotes basic semantic interoperability. consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. all three repositories showed 100 percent consistency in dspace-generated elements. dspace’s automated creation of date and format fields provided reliably consistent records in those element fields. dspace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal consistency. however, the metadata elements were much less consistent for contributor-generated information. inconsistency within the subject element is where most problems occurred. personal names used as subject heading and capitalization within subject headings both proved to be particular issues. dspace alphabetizes subject headings according to the first letter of the free text entered in the keyword box. thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. the same is true for capitalization. any difference in capitalization of any word within the free-text entry generates a separate subject heading. another field where consistency was an issue was dc.description.sponsorship. sponsorship is problem because different communities, even different collections within the same community, use the field to hold different information. some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. some collections used sponsorship to acknowledge the donation of the physical materials documented by the record. while all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. the largest consistency issue, however, came from table 1. metadata fields and their frequencies element univ. of n.m. univ. of wash. ohio state univ. title 20 20 20 creator 0 0 16 subject 20 11 8 description 12 16 17 publisher 4 4 8 contributor 16 19 3 date 20 20 20 type 20 20 20 identifier 20 20 20 source 0 0 0 language 20 20 20 relation 3 1 6 coverage 2 0 0 rights 2 0 0 provenance ** ** ** **provenance tags are not visible to public users dc, dspace, and a brief analysis of three university repositories | kurtz 45 a comparison of repository policies regarding element use and definition. unaltered dspace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. however, osu’s dspace software has been altered so that the dc.contributor .author field does not exist. instead, text entered into the author textbox during the record-creation process maps to dc.creator. although both uses are correct, this choice does create a significant difference in element definitions. osu’s dspace author fields are no longer congruent with other dspace author fields. n conclusions dspace was created as repository management tool. by streamlining the record creation workflow and partially automating the creation of metadata, dspace’s developers hoped to make institutional repositories more useful and functional while time providing an improved experience for both users and contributors. in this, dspace has been partially successful. dspace has made it easier for the “metadata naive” contributor to create records. and, in some ways, dspace has improved the quality of repository metadata. its automatically generated fields ensure better consistency in those elements and subfields. its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. however, dspace still relies heavily on contributorgenerated data to fill most of the dc elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. nonmandatory fields are skipped, leading to incomplete records. data entry errors, a lack of authority control over subject headings, and confusion over element definitions can lead to poor metadata accuracy. a lack of enforced, uniform naming and capitalization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. while most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. to improve the inconsistency of the dspace records, the three universities have tried differing approaches. only unm’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. unm has a 100 percent fill rate for subject elements in its records, while uw and osu do not. this is not to say that unm’s process is perfect and that poor records do not get into the system—they do (see appendix b for an example). but it appears that for now, the intermediary intervention of a librarian during the record-creation process is an improvement over self-archiving—even with education—by contributors. references and notes 1. association of library collections & technical services, committee on cataloging: description & access, task force on metadata, “final report,” june 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed mar. 10, 2007). 2. a voluntary (and therefore less-than-complete) list of current dspace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&ite mid=180. further specific information about dspace, including technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&itemi d=125. 3. jung-ran park “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3 (2009): 213–28. 4. sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata creation process,” alt-j: research in learning technology 12, no. 1 (2004): 5–20. 5. jung-ran park and eric childress, “dc metadata semantics: an analysis of the perspectives of informational professionals,” journal of information science 20, no. 10 (2009): 1–13. 6. ibid. 7. for a fuller discussion of the collection’s problems and challenges in using both dspace and dc, see kathleen forsythe et al., university of washington ealy buddhist manuscripts project in dspace (paper presented at dc-2003, seattle, wash., sept. 28–oct. 2, 2003), http://dc2003.ischool.washington.edu/ archive-03/03forsythe.pdf (accessed mar. 10, 2007). lita cover 2, cover 3 neal-schuman cover 4 oclc 7 index to advertisers 46 information technology and libraries | march 2010 appendix a. a list of the most commonly used qualifiers in each repository university of new mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsibility (1) dc.description.tableofcontents (1) appendix b. sample record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract president schmidly’s charge for the creation of a north golf course community advisory board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_us dc.subject president dc.subject schmidly dc.subject north dc.subject golf dc.subject course dc.subject community dc.subject advisory dc.subject board dc.subject charge dc.title community_advisory_board_charge dc.type other 156 information technology and libraries | december 2011 mark dehmlow editorial board thoughts: sharing responsibility in the digital age t his topic is very resonant for me because this past year we launched a new interface to our catalog, rich with all of the features that our users have been self-trained to expect from browsing the internet. we actually launched this project in public beta for two years, t–w–o years. i should also mention that the initial implementation team was diverse, drawing from technology, public services, collections, and technical services. yet, when we launched the project into production, it was only then that we heard concerns and complaints. those concerns revolved around two things—first, there was functionality in the classic catalog that wasn’t in the new one, and second, people were used to the old way of doing things and didn’t know how the supposedly more intuitive interface worked—a kind of opacholm syndrome, and more importantly for librarians, they wanted to know how to exploit the system powerfully. we also found during the first semester, there were few instructors teaching the new system because they were afraid they couldn’t speak authoritatively about it. people are creatures of habit and even though something might be easier to learn if it were your first exposure to it (macs vs. pcs anyone?), often times changing from a more complex, but well understood, process is difficult. i remember years ago at another institution i worked for, helping the organization move from a menu driven ils interface to a graphical user interface. it required staff to actually rethink the process they were performing because although the gui is able to make the process more efficient, it also hides many of the more mundane parts of it. with all of those concerns on the table all of a sudden, what did we do? we spent the summer after our production launch providing targeted training sessions and gathering in person feedback from our internal stakeholders. it probably amounted to more than thirty meetings over the course of three months. we synthesized feedback, identified the biggest pain points, and spent a couple of months developing solutions. providing a more organized training program and targeted feedback sessions as a replacement for our more generalized call for input bought us a lot of goodwill internally. it also gave us some direction on what areas to focus on and opened more dialog with the rest of the library. in the end, it is really important for all areas to be responsible for trying out new systems, even if those responsible are doing more outreach than simple general calls for participation. in some ways, those deploying new systems have the greater onus in this relationship in that they are driving many of the effort; this is especially critical for changes that have broad impact. taking a more organized and proactive approach to training and acclimating our organizations to change can go a long way to reducing conflict and stress. everyone is extremely busy, and the tendency for people is to ignore the things that aren’t directly in front of them. making efforts toward a more proactive strategy raises awareness and by meeting in person, you show people that their input is valuable enough to make time to listen and talk to them. taking this type of approach is important even in the cases where projects are managed by committees. liaisons don’t necessarily provide organizational saturation and oftentimes the vital information about a new system is filtered through their own sense of what is critical. a good start to determine how much communication is needed is to first gauge the potential impact—if a change affects more than a certain percentage of the library and its users, it probably means it will require a good deal more outreach so people don’t feel quite as off balance when the change is implemented. those deploying projects should add a couple of months onto the end of planning cycles to help provide training and gather feedback in a hands on way—e-mail announcements are more often ignored than not given the sheer amount of e-mail everyone gets these days. another possible strategy is to devise testing scripts for anyone trying the system to follow as opposed to just having them “try it out.” a script will give people some direction and hopefully get them into system functionality that they otherwise could miss by trying it without any specific goal. i am not so naive to think we will reach an allencompassing-kumbaya moment where communication is perfect and everyone agrees on what kinds of changes to implement in our organizations. i do think though, that teams and individuals who are implementing new systems can help alleviate anxieties if we build more time into our deployment processes to ease our organizations into change instead of hoping they learned how to swim before we all jump in. mark dehmlow (mdehmlow@nd.edu) is head, library web department, interim head, library information systems department, hesburgh libraries, university of notre dame, notre dame, indiana editor’s comments bob gerrity information technology and libraries | december 2012 1 past and present converge with the december 2012 issue of information technology and libraries (ital), as we also publish online the first volume of ital’s predecessor, the journal of library automation (jola), originally published in print in 1968. the first volume of jola offers a fascinating glimpse into early days of library automation, when many things were different, such as the size (big) and capacity (small) of computer hardware, and many things were the same (e.g., richard johnson’s description of the book catalog project at stanford, where “the major achievement of the preliminary systems design was to establish a meaningful dialogue between the librarian and systems and computer personnel.” plus ça change, plus c'est la meme. there are articles by luminaries in the field: richard de gennaro describes approaches to developing an automation program in a large research library, frederick kilgour, from the ohio bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. http://ejournals.bc.edu/ojs/index.php/ital/issue/view/312 editor’s comments bob gerrity editor’s comments | gerrity 2 college library center (now oclc), analyzes catalog-card production costs at columbia, harvard, and yale in the mid 1960s (8.8 to 9.8 cents per completed card), and henriette avram from the library of congress describes the successful use of the cobol programming language to manipulate marc ii records. the december 2012 issue marks the completion of ital’s first year as an e-only, open-access publication. while we don’t have readership statistics for the previous print journal to compare with, download statistics for the e-version appear healthy, with more than 30,000 full-text article downloads for 2012 content so far this year, plus more than 10,000 downloads for content from previous years. based on the download statistics, the topics of most interest to today’s ital readers are discovery systems, web-based research guides, digital preservation, and digital copyright. this month’s issue takes some of these themes further, with articles that examine the usability of autocompletion features in library search interfaces (ward, hahn, and feist), reveal patterns of student use of library computers (thompson), propose a cloud-based digital library storage solution (sosa-sosa), and summarize attributes of open standard file formats (park, oh). happy reading. editor’s comments bob gerrity information technology and libraries | september 2012 1 g’day, mates, and welcome to our third open-access issue. ital takes on an additional international dimension with this issue, as your faithful editor has taken up residence down under, in sunny queensland, australia. the recent ala annual meeting in anaheim marked some changes to the ital editorial board that i’d like to highlight. cynthia porter and judith carter are ending their tenure with ital after many years of service. cynthia is featured in this month’s editorial board thoughts column, offering her perspective on library technology past and present. judith carter ends a long run with ital as managing editor, and i thank her for her years of dedicated service. ed tallent, director of levin library at curry college, is the incoming managing editor. we also welcome two new members of the editorial board: brad eden, the dean of library services and professor of library science at valparaiso university, and jerome yavarkovsky, former university librarian at boston college, and the 2004 recipient of ala’s hugh c. atkinson award. jerome currently co-chairs the library technology working group at the mediagrid immersive education initiative. we cover a broad range of topics in this issue. ian chan, pearl ly, and yvonne meulemans describe the implementation of the open-source instant messaging (im) network openfire at california state university san marcos, in supporting of the integration of chat reference and internal library communications. richard gartner explores the use of the metadata encoding and transmission standard (mets) as an alternative to the fedora content model (fcm) for an “intermediary” digital-library schema. emily morton and karen hanson present an innovative approach to creating a management dashboard of key library statistics. kate pittsley and sara memmott describe navigational improvements made to libguides at eastern michigan university. bojana surla reports on the development of a platform-independent, java-based marc editor. yongming wang and trevor dawes delve into the need for next-generation integrated library systems and early initiatives in that space. melanie schlosser and brian stamper begin to explore the effects of reposting library digital collections on flickr. in addition to the compelling new content in this issue of ital, we have compelling old content from the print archive of ital and its predecessor, journal of library automation (jola), that will soon be available online, thanks in large to the work of andy boze and colleagues at the university of notre dame. scans of all of the back issues have now been deposited onto the server that currently hosts ital, and will be processed and published online over the coming months. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, st. lucia, queensland, australia. a library website migration: project planning in the midst of a pandemic communication a library website migration project planning in the midst of a pandemic isabel vargas ochoa information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.14801 isabel vargas ochoa (ivargas2@csustan.edu) is stockton campus & web services librarian, california state university, stanislaus. © 2022. abstract this article provides a background on the migration of the california state university (csu), stanislaus library website from an open-source platform to a content management system specifically designed for library websites. before the migration, there was a trial of different content management systems (cms), a student usability study, and consultations with outside web and systems librarians to acquire better insight on their experiences migrating a library website and their familiarity with the different cms trialed.1 the evaluation process, website design, and usability study began before the pandemic and the global shift to remote services. however, despite this shift, the timeline for the migration was not altered and the migration was completed as planned. within a year, the library website migration planning, designing, trialing, and structural organization was completed using a modified waterfall model approach. background completed under a sudden time limit, the website migration project for the california state university (csu), stanislaus library website is both distinctive and relevant to other libraries who plan to complete a redesign of their website, on both desktop and mobile screens, to meet accessibility requirements under a limited schedule caused by unforeseen circumstances—in this case, a global pandemic and sudden shift to remote work. the website migration project included a reconsideration of the content management system (cms) the library was hosted on. csu stanislaus, surrounded by agricultural landscapes and settled in the central valley, is a hispanic-serving and minority-majority university. ethnic minorities make up 70% of total enrollment and three-fourths of the undergraduates are first-generation students.2 in fall 2021, a little over 8,800 fte (full time equivalent) students were enrolled and total enrollment reached 10,500.3 the university has two campuses, turlock and stockton, and four colleges: the college of science; the college of business administration; the college of arts, humanities and social sciences; and the college of education, kinesiology and social work. the csu stanislaus library website has been designed and redesigned over twenty years for services and content updates, university and library rebranding, and to comply with web accessibility requirements. before the library website migration in 2020, at the start of the covid19 pandemic, the website was developed and produced using the drupal platform (version 7), an open-source cms. the contents of the library website have been updated from time to time since its first years and the website’s front-end design has been modified during the past few years. before the website migration from drupal to springshare llc, the library explored various cms, including wordpress and joomla. initially, staff encountered issues on the former library website hosted on drupal. over the years, the library’s website became difficult to maintain, due to the continuously modified written framework, and to implement new branding across the website’s content and overall theme. mailto:ivargas2@csustan.edu information technology and libraries december 2022 a library website migration | ochoa 2 the objective of the website migration project was to effectively enhance the website’s interface for usability and to meet current standards and guidelines for accessibility. additionally, the university was set to launch a new design of the institutional website, which required that the library emulate the university website’s design for uniformity. in preparation for the university website redesign, it was necessary for the library to model the new university design, explore cms, and migrate the library website. a possible migration from drupal 7 to drupal 8 was investigated from 2017 to 2018, when the former web services librarian was in the process of redesigning the website; however, the redesign and migration was not completed. it was discovered around the same time that editing or upgrading the initial design and development of the library website, which utilized a community developed design heavily customized over the years, would make the migration extremely difficult. any editing that triggered modification of the locally customized theme caused other elements of the website to break or collapse, particularly the website’s layout design, including the header, footer, and menu. it quickly became apparent that it would be more sensible to begin the redesign of the website infrastructure on a platform starting from scratch. a new design would also facilitate the application of the latest accessibility and usability standards. a complete rebuild would also afford the library the option to consider other web management platforms. this would evidently be a challenging feat. so, a modified version of the waterfall model was adopted. for this migration, a simple cascading approach was chosen as it worked best with the natural flow of the library’s planned migration. the waterfall model consists of the following objective processes: requirement analysis, design, construction, acceptance testing, and transition .4 for the planned migration, the requirement analysis was confirming that i would have a local and cloud server available for trialing the cms and developing a website design. the design and construction processes would be complete when the new website design was created, the cms trials were finalized, and outside web services and systems librarians were consulted. the testing phase would be complete when the student usability studies were concluded. as explored in my previous article, “navigation design and library terminology,” a user-centered usability study was conducted to assist in the library website redesign and create the website prototype. the prototype was designed to assess the library website’s front-end elements as well as the layout theme and overall design. lastly, the transition, or migration process, would be the final planned objective in the approach. as the web services librarian, i worked as the website migration project manager. the project manager migrated the final and redesigned library website and website content, conducted the student usability studies, tested the cms and created a cms recommendation for the library, and consulted with outside librarians on their experience migrating a website and using drupal or springshare as their library website cms. timeline and an unexpected pandemic the cms trials began in fall 2019 and continued until summer 2020. the former library website used drupal 7, so drupal 8 and springhare’s libguides cms were set to be trialed. the trials consisted of developing and designing a new library website on the platforms. experiences were documented and the design process was recorded for analysis and comparison information technology and libraries december 2022 a library website migration | ochoa 3 of the platforms. this information was used to determine which platform would best support the new website. consultations were also sought from various web services librarians, system librarians, and information technology professionals. table 1. timeline of the website migration semester activities fall 2019 • trial springshare libguides cms5 • develop library website design prototype • consult outside web and systems librarians on their migration to libguides cms spring 2020 • trial drupal (version 8) • test website design prototype through a student usability study6 • complete, compare, and analyze libguides cms and drupal platform trials summer 2020 • finalize library website design • migrate former library website content to final chosen platform • complete library website migration cms trials libguides cms and drupal were the systems trialed for the library website. drupal was used for more than two decades and consideration was given to upgrading to the latest version of the platform, drupal 8. springshare libguides cms was trialed as library staff and faculty were already familiar with libguides and subscribed to several other springshare applications, including libanswers and libchat for virtual research consultations, libcal for reserving library spaces, and libinsight for user analytics. trialing of libguides cms began in fall 2019. a website prototype (design, theme, homepage) was developed and designed using the platform. the platform was analyzed and explored heavily since it was a platform that had not been previously utilized by our library. libguides cms offers unlimited advanced customized groups. features like publishing workflow management, discussion boards, internal sites, various account types, password protections , and ip restrictions, and courseware integration via lti, were also researched during the test. in terms of content creation and maintenance, there were limitations under libguides cms. libguides cms has a built-in framework, ideal for libraries, with default settings that may disrupt or limit complex customization. at the time of the trial, additional support from springshare was needed to override default settings. also, libguides cms, compared to drupal, did not provide an option for tracking revisions on guides and content. drupal, a highly programmable free open-source website platform, was the previous platform the library website was hosted on. like all upgrades, drupal 8 offered a series of new features and improvements, from framework to themes. as a highly programmable platform, it requires information technology and libraries december 2022 a library website migration | ochoa 4 mindful designing and programming to establish the infrastructure and design. drupal 8 was tested and trialed on a development site in february 2020. the development site on drupal was utilized for the final evaluation of the cms. when the campus was ordered to partially close in march 2020, the home page and foundational design were completed; however, the development site was inaccessible remotely due to a block by the campus firewall. the development site resided on a protected local server on campus, and special permissions were required for remote access. unfortunately, i was not granted the special permissions required in due time, so the development site on drupal was put on hold. based on projections after creating a foundational design in february 2020, it would have taken about six months to complete the overall website design. remotely, i continued the drupal 8 evaluation, considered the results of librarian consultations and the literature on drupal as a cms. consultations and cms comparisons generally, the difference observed between both platforms is that libguides cms is a content management system primarily designed and maintained for library websites, whereas drupal is a framework for all sites, including highly customized websites. to support the cms comparisons, six web services and system librarians were consulted prior to the migration. the librarians were from distinct institutions: two community colleges and four 4year universities ranging from 2,000 enrolled students to 30,000 enrolled students, and library departments from 10 to over 200 library personnel. a systems librarian from a university of about 2,000 enrolled undergraduate students, regarding their experience migrating their website from drupal to libguides cms, shared that, “[their migration] took a couple months . . . we worked with campus it and springshare to ‘flip the switch’.” a digital services librarian from a university of over 10,000 enrolled students explained, “the entire transition probably took about 6–8 months.” the time to migrate a library website would depend on the size of the website, which was also influenced by the size of the campus. with more than 10,000 students, the csu stanislaus library website migration project was scheduled to be completed by the end of the summer semester, from june 2020 to august 2020. creating a new website from scratch on drupal proved to be a longer process than creating a new website on libguides cms. a systems librarian explained that “[libguides cms] is also quite streamlined compared to trying to maintain a more complex platform like drupal, which makes it a bit easier for librarians who are not full-time professional coders.” still, libguides cms is not as robust and did not offer the level of customized creation that drupal offered for general websites. the systems librarian added that although it is helpful to have a web services or systems librarian who can code full time, “turnover happens and some libraries can’t be sure there will always be someone on hand who is comfortable doing that coding.” ideally, having a full-time programmer is valuable for any library managing their own website; however, it is currently not the case for our university library. a user interface developer from a university of over 30,000 enrolled undergraduate students described their experience using a large amount of css to override default settings in libguides cms. they explained, “we have a large amount of overriding css, not to mention that it makes [it] messy. when building a site [in libguides] you can do whatever you want as long as you know where to put your code, utilize the js libraries springshare uses, implement css to override their information technology and libraries december 2022 a library website migration | ochoa 5 system default items/styles and use their api.” before finalizing our migration, we were required to contact the springshare technical support team to override default settings in libguides cms. however, the default settings are implemented to guide nonprogramming librarians when creating web content. a web services librarian from a university with 2,000 undergraduate students enrolled stated, “i don't think [libguides cms] will be anything like a drupal or wordpress cms. but, i do believe that their software is the perfect niche for libraries and librarians.” libguides cms required getting used to as csu stanislaus library staff were accustomed to hosting the website on a cms that allowed file transfer protocol (ftp). as a systems librarian explained, it can be difficult to organize a large amount of coding in libguides cms since “you don’t get your own server that you can configure and use for things like ftp storage.” the experiences shared by librarians were similar throughout our process for creating the new site and design, and these consultations in particular were not only insightful but helped prepare for and organize the structure of the website before actually migrating the content. an additional component considered before choosing a cms was the technical support and server options available for each. libguides cms is cloud-based and hosted by springshare, which currently uses amazon web services (aws). upgrades are implemented by springshare overnight, as well as minute-by-minute base backups. for the most part, systems and web librarians were satisfied with springshare technical support to implement these changes. with drupal, the institution can choose whether to host their site on a cloud or local server. during our migration in summer 2020, we sought assistance from springshare technical support to modify security certificates and custom domain names. if a site is hosted on drupal, the librarian can implement security certificates and update custom name domains without having to contact the drupal technical support team. it is fundamental for a library to consider these features as well, especially if under a set timeline. these consultations with developers and with systems and web librarians aided in the understanding of what libguides cms and drupal offered based on general comparisons, programming, customization, and technical support. cms accessibility compliance the accessibility levels of each cms also supported the final decision of the chosen cms. according to the web content accessibility guidelines (wcag), there are three levels of accessibility conformance: a (lowest), aa (midrange), and aaa (highest).7 currently, the target level of accessibility for csu campus websites is aa, which also includes all the guidelines found under conformance a. regardless of the foundational framework for both libguides cms and drupal, it was determined after exploring accessibility on these cms that developers should regularly test the design and content customization for accessibility. ultimately, the accessibility levels of a library website and its mobile responsiveness are dependent on the local management and develo pment of the sites. website design: usability study concurrent to the consultations, trials, and design development, a usability study was conducted in february 2020 with a total of 38 university student participants, including undergraduate information technology and libraries december 2022 a library website migration | ochoa 6 students, from freshman to seniors, and graduate students. the usability study was organized to test the website navigation design prototype that was built and used during the cms trials. students’ feedback would guide the decision of whether to design an audience-based navigation menu or a topic-based navigation menu. the study was conducted in a closed and monitored library room with laptops prepared. students were asked to answer questions and complete tasks to test the website design prototype menu navigation design. each student’s actions were recorded through screen recordings and visual observation, while assigning numbers to students to ensure anonymity, e.g., student 1, student 2, etc. the following seven tasks were used for the student usability study: 1. find research help on citing legal documents—a california statute—in apa style citation. 2. find the library hours during spring break. 3. find information on the library study spaces hours and location. 4. you’re a student at the stanislaus state stockton campus and you need to request a book from turlock to be sent to stockton. fill out the request-a-book form. 5. you are a graduate student and you need to submit your thesis online. fill out the thesis submission form. 6. for your history class, you need to find information on the university’s history in the university archives and special collections. find information on the university archives and special collections. 7. find any article on salmon migration in portland, or. you need to print it, email it to yourself, and you also need the article cited. during each study, students’ actions were screen recorded using snagit, screen capture and screen recording software installed on the laptops. data collected included the ease of access in terms of navigation behavior, the number of clicks, web pages visited, and the time it took for students to complete each of the seven tasks. that data was recorded and analyzed from the anonymously saved screen recorded videos. students were also asked to answer questions at the end of the study in the form of a written survey, which was then collected and utilized to support the final decision of the outcome of the prototype design. the results of the usability study provided the library with a variety of outcomes and several elements were integrated to lead the redesign of the library website’s header and main menu. the results of the study showed that the prototype’s design navigation, an audience-based navigation, was not as user friendly as predicted; therefore, the library website prototype design would need to be edited and modified to revert to the current navigation design of the existing website, which is a topic-based navigation. students had difficulties with the audience-based navigation design since it required them to select an “audience type” under the menu (fig. 1). their selection was determined by assessing where they believed the information was found. since most students did not understand the structure of the website, they did not know how to utilize the audience-based navigation to complete the seven tasks. although they found that the navigation design of the website was clear and simple, it required a “getting used to.” information technology and libraries december 2022 a library website migration | ochoa 7 the results also highlighted the effects of the use of library terms. to make menu links exceptionally user friendly, clear and common terminology was added. an additional component was a search-all search box for the website, which was advocated by the student participants. based on navigation results, the main menu and submenus were also structured to not only be clear and organized, but for popular pages to be mapped and linked in more than one menu. figure 1. screenshot of the audience-based navigation design developed for the library website prototype. figure 2. screenshot of the topic-based navigation in the former library website. the design structure of the website relied on the organization and management of the website pages. to maintain a congruent structure, it was necessary to choose a navigation design that met the needs of our students. in this case, the results determined that the topic-based navigation was preferred; thus, the management of website pages and submenus was modified to fit this navigation. the usability study was focused on testing the navigation design of the website and the navigation main menu. given the helpful feedback from users and having more participants than expected, it would have been beneficial to also test other aspects of the website in this study. the home page is the landing point and statistically the most visited web page of the library’s website. it is the hotspot for students and our university’s community to find the catalog, resources to fulfill their research needs, upcoming news and events, the reservation platform, and more. however, as the website redesign progressed, there were challenges in designing the primary components of the website home page, such as assessing what elements were fundamental to have on the library website, following web accessibility requirements, and following the university website’s new theme and design. ultimately, the components of the former library website’s home page were migrated in its similar structure to the redesigned information technology and libraries december 2022 a library website migration | ochoa 8 website. yet, adding questions and tasks on the usability of the library home page and its content components would have certainly aided in not only the redesign and migration, bu t the direction of the library website’s future development. this information will be a focal point for future usability studies of the library’s website. the final migration the greatest challenge throughout the project was the time constraint due to the covid-19 pandemic. because the pandemic brought several unforeseen obstacles into staff work schedules, it was challenging to manage the time needed to complete the project and simultaneously work around tasks surfaced by the pandemic. however, staff committed to stay on schedule and to complete the migration project before the start of the fall semester despite the circumstances. after transitioning to remote library services, emphasis was placed on developing the website and web content. even more so, the migration project served to ensure that the library was providing an enhanced and accessible desktop and mobile website for users who were now working from home. additionally, this included the management of web services, on top of the migration project. a concise and organized schedule was necessary and although time management of the different projects and tasks offset by the pandemic was challenging, the web services librarian was fortunate to have support from the library information technology staff. after the cms trials and after the website prototype usability study, libguides cms was chosen as the content management system for the university library website. because the library was looking for an easy-to-use platform, utilizing libguides cms reduced the time needed to build an infrastructure and allowed simple website content management, maintenance, and an improvement over the former website’s accessibility and mobile responsiveness. the platform worked well for the campus and the library; however, each library should evaluate its respective department priorities, along with what is expected, desired, and needed for their individual library website to successfully showcase services and programs to users. following a modified waterfall model approach proved to be a success for the website migration project due to existing resources and scheduled timeline for implementation. in a future virtual renovation or redesign of the library website, the library will explore various project planning models pertinent to the future proposal’s desired outcomes. endnotes 1 isabel vargas ochoa, “navigation design and library terminology,” information technology and libraries 39, no. 4 (2020). 2 “diversity and equity data portal,” california state university, stanislaus, 2021, https://www.csustan.edu/iea/diversity-and-equity-data-portal. 3 “quick facts,” california state university, stanislaus, 2021, https://www.csustan.edu/iea/institutional-data/quick-facts. 4 bob hughes and roger ireland, project management for it-related projects, 3rd edition, (swindon, uk: bcs learning and development, 2019). 5 vargas ochoa, “navigation design and library terminology.” https://www.csustan.edu/iea/diversity-and-equity-data-portal https://www.csustan.edu/iea/institutional-data/quick-facts information technology and libraries december 2022 a library website migration | ochoa 9 6 vargas ochoa, “navigation design and library terminology.” 7 “web content accessibility guidelines (wcag) 2 level aaa conformance,” w3c web accessibility initiative (wai), web accessibility initiative (wai), 13 july 2020, https://www.w3.org/wai/wcag2aaa-conformance. https://www.w3.org/wai/wcag2aaa-conformance abstract background timeline and an unexpected pandemic cms trials consultations and cms comparisons cms accessibility compliance website design: usability study the final migration endnotes dspace 7 benefits: is it worth upgrading? article dspace 7 benefits is it worth upgrading? matus formanek information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16209 matus formanek (matus.formanek@fhv.uniza.sk) is assistant professor, department of mediamatics and cultural heritage, faculty of humanities, university of zilina, slovakia. © 2023. abstract this study discusses the importance of the dspace open-source software that supports numerous digital libraries and repositories around the world. with the release of dspace version 7, a natural question that arises is whether the new version offers enough new functionalities to motivate system administrators to upgrade. this paper briefly describes the most important changes, including new features and bug fixes, included in dspace 7.4 and prior minor versions. the next parts of this paper explore our estimate that there are several thousand dspace-based systems globally that will likely have to be upgraded in the near future. the main reason for this need is that older versions of dspace (including 5.x) have reached the end of their developer support period or are reaching it in mid-2023. based on our own upgrade experience, we propose suggestions and recommendations on migrating from the previous dspace 6.3-based environment to the new one in a case study that concludes this article. institutional repositories and dspace countless new research papers are being produced around the world daily thanks to the hard work of scientists and research teams. the significance of their research is fully realized when it is shared with other scientific communities. scientists need to disseminate the outputs of their scientific work in an appropriate way. khan and sheikh emphasized in their work that higher education institutions are confronted with the intricate task of preserving and disseminating their intellectual outputs. to effectively address these challenges and ensure open access (oa) to research works, institutional repositories (irs) as information systems emerge as indispensable and substantial tools for safeguarding these scholarly documents in various digital formats.1 in the context of this study, the term institutional repositories refers to digital archives established to collect, preserve, and distribute scholarly and research-related digital scientific outputs within an organization, particularly in academic and research institutions.2 as zervas et al. further stated, institutional repositories were originally developed to showcase their academic work.3 however, valuable digital content must not only be disseminated but also preserved in a reliable way. as khan, khousa, and thelwall noted, digital repositories—institutional repositories included— assume a significant role in facilitating these procedures by furnishing the necessary policy and infrastructure to uphold efficient curation and enduring preservation in contemporary times.4 an appropriate application and other support software is required to create a digital (institutional) repository or digital library. either proprietary or open-source software is used. in our work, we will focus on a particular type of open-source software for supporting digital libraries and repositories. as mapulanga pointed out, the utilization of open-source software has empowered developers globally, granting them the capability to craft tailor-made features and functionalities for the systems.5 this type of software is usually obtained and used for free, which mailto:matus.formanek@fhv.uniza.sk information technology and libraries september 2023 dspace 7 benefits 2 formanek is one of its indisputable advantages. to maintain an objective perspective, it should be acknowledged that implementing and maintaining an open-source operating system and software can entail significant additional costs, despite the potentially low or nonexistent initial acquisition cost of the software. dspace the focus of this paper is an advanced, open-source software called dspace. it is developed to run modern open digital libraries and repositories in academic or other organizations and is supported by a wide community of developers and enthusiasts around the world. dspace provides a robust software framework for durable access to research materials and other valuable digital content, assets, and digital resources, available online, all with an eye on their long-term preservation. for example, dspace is applied to preserve and provide access to academic faculty and students‘ papers, preprints, projects, audio/video visual materials, datasets, or any other important materials.6 in the words of the software developers, the long-term goal of open-source software, which also includes dspace, is, among other things, to ensure that “current and future generations have access to our collective digital heritage.”7 the dspace project was founded in 2002 by hewlett packard and mit. as a result of their close and intensive collaboration, dspace 1.0 was released on november 4, 2002.8 as an advanced software tool for supporting digital libraries and repositories, it thus celebrated its 20th anniversary last year. since its inception, development has progressed rapidly. as the number of active users grew and these users demanded more and better support, it was necessary to establish the dspace foundation—a nonprofit organization that provided technical leadership and administration to support the increasing number of dspace users. the dynamic speed of development has become a characteristic sign of dspace even today. currently, the development of the dspace project is managed by lyrasis and directly supported with in-kind contributions of development resources and financial donations through a community sponsorship program. lyrasis provides leadership and innovation for open technologies that promote durable, persistent access to digital data. this organization collaborates with many academic, scientific, cultural, and technology communities.9 dspace, as software, is available to download for free from the official lyrasis website (https://dspace.lyrasis.org/download/). it may also be freely installed and used, as a low-cost, on-premise software solution. in that case the repository instance is installed on a local server at an institution. alternatively, if preferred, it is possible to host the entire digital repository solution in the cloud as a subscription service for archiving, sharing, and managing content. in our study, we focus primarily on the important steps and principles of setting up and operating a locally run instance of dspace as well as understanding these steps and principles in more depth. currently, dspace is the world's most widely adopted software system for running open archives, libraries, and repositories in general. this claim is supported by numerical data obtained by the globally respected, authoritative registry called open directory of open access repositories (opendoar), which regularly collects and updates relevant metrics each month. dspace has held the largest share for adopted software for open access institutional repositories, approximately 39 percent, among other similar software solutions for several years in a row (see figure 1). https://dspace.lyrasis.org/download/ information technology and libraries september 2023 dspace 7 benefits 3 formanek figure 1. dspace’s dominant position among other similar software solutions.10 we employed shown data to demonstrate the extensive utilization and broad acceptance of dspace in comparison to other alternative systems. additionally, the data retrieved from opendoar (see fig. 1) are supported by equivalent up-to-date data from other authoritative registries such as registry of open access repositories (roar) and re3data. 11 we will consider data from the duraspace registry as supplementary because this registry was created directly by lyrasis (which is the organization that manages dspace’s development).12 in view of this fact, it is reasonable to assume that this registry mainly includes the systems supported by them (except dspace, fedora, and vivo are also systems maintained by lyrasis). thus, in this case, we do not consider the high percentage of dspace-based systems in the registry by lyrasis to be representative. all retrieved data is shown in table 1 below and up to date as of january 10, 2023. table 1. dspace market share. opendoar roar re3data duraspace registry total number of entries 6,002 4,725 3,072 3,703 sum of dspace-based repositories and libraries 2,363 (dspace) + 16 (dspacecris) (39.64%) 2,377 (50.3%) 120 (3.9%) 3,138 (84.7%) as is shown in table 1, except in the re3data registry, dspace is the most popular software. these findings are consistent with the findings of the following research papers and surveys focusing on the topic of open repositories and the widespread popularity of dspace. surveys conducted by information technology and libraries september 2023 dspace 7 benefits 4 formanek several studies found that institutional repositories are more likely to use dspace, fedora, eprints, greenstone, and invenio in this order.13 previous research by ivanović et al. has also found very similar results for dspace adoption by institutional repositories.14 scientific papers written by khan and sheikh and ahammad also highlight the important position of dspace as a world leader in open-source software for digital repositories, or open-source digital library software (osdls). these studies, which only focused on universities in pakistan, showed that dspace is used by 47.6 percent of the digital library or repository systems.15 gurikar and hadagali also investigated the adoption of open-source software and the role of underlying software for institutional repositories within india. based on the data collected, they confirmed that dspace and eprints are the two most popular tools used.16 end of support for older dspace releases released in december 2013, dspace 4.x is a release that remains currently unsupported along with all previous dspace versions since 1.4.0. its support ended in july 2021. the newer major dspace release (5.x) is officially unsupported as of january 1, 2023. this date represents the release end of important security updates. the current major release 6.x (or 6.4) will also see endof-life as of july 1, 2023.17 after this date, bug or other security fixes will no longer be published for these releases. system administrators may worry with this announcement, as explained in the next paragraph. in addition, tim donohue, technical lead for dspace at lyrasis, posted a brief explanation for end of support on the lyrasis wiki pages. the primary reason for dspace 5.x as well as 6.x is as follows: [t]he support period for any major release was usually 3–4 years. the support period for 5.x and 6.x was necessarily extended to allow for the development of 7.x (which is a significant update to the dspace platform). as you may be aware, in the last year, several major open-source libraries have discovered major security vulnerabilities. while dspace 5.x and 6.x were luckily not impacted, those events forced us to reanalyze the ongoing support promised for these releases. because both versions 5.x and 6.x were designed and built on technologies from 6–7 years ago, they have become difficult to maintain. the age of these releases impacts our ability to promise that all future security vulnerabilities can be addressed. these old codebases are showing their age and cannot be made as secure as the 7.x platform. therefore, we highly recommend all institutions begin planning an upgrade to dspace 7.x as soon as reasonably possible.18 dspace migration and upgrade: case studies in current literature worldwide, there are relatively few scientific or conference papers that are in any way related to the issue of migration, modification, or upgrade of a repository/library based directly on dspace. zervas et al. focused on the transformation of a locally running repository into a current research information system (cris), which we consider as an upgrade process.19 bimbe et al. dealt with challenges related to implementation and upgrading a dspace-based institutional repository at the university of zambia. their paper explained the steps taken, the challenges faced, and the process taken to install the software and perform an upgrade to the latest version.20 casados et al. described in their paper an entire workflow of transformation from a legacy repository that was developed in house to a solution based on dspace-cris.21 kawale and vandana published their conference paper relating to the process of installation and customization of a dspace-based institutional repository.22 information technology and libraries september 2023 dspace 7 benefits 5 formanek as these studies show, the process of migration or upgrade is not entirely simple, even in the case of dspace. this paper can help the expert community to raise and partially resolve key issues and assure system administrators that upgrading is a step forward. however, it needs to be carefully considered first. the steps need to be individually explored and tested in the context of each organization’s capabilities. zervas et al. observed that a shared objective is to ensure the continuous alignment of digital repositories with the latest technological advancements. institutional repositories must consistently adjust their mission, objectives, and services to align with those of the university and the community they cater to.23 iowa state university successfully migrated its old digital repository to an open-source dspace 7 instance as early as january 2022. this migration was accomplished following extensive years of research and meticulous planning, with the aim of presenting users with a more streamlined interface, enhanced search functionalities, and a more adaptable repository structure, while preserving all the services and statistical data that users have grown to value.24 given the fact that dspace is currently the most popular software for running digital libraries and repositories, the need or necessity for early upgrades may become an issue or challenge for many systems worldwide. in the following section, we will attempt to sketch the distribution of each dspace version in a production environment. number of dspace-based repositories running worldwide the aim of this section is to approximate the number of repositories worldwide that require upgrading, in accord with the already confirmed information about ending support of the current version. to be able to approximately describe the current state of dspace repositories in the world , we need to at least estimate the relative distribution between the different dspace versions within the repositories deployed in a production environment. we cannot state this information with complete accuracy, but we can rely on the available data online in the repository registries. our survey was conducted to collect data from the duraspace registry maintained by lyrasis. it is the unique registry that also provides information about the version of dspace used for each registered system—other registries (such as opendoar/roar/re3data) do not include this information in their entries. figure 2 is a screenshot with a view of our local departmental repository entry, with the highlighted information about the used major version of dspace. the specific minor version (e.g., 6.3) is not shown. figure 2. the digital repository of the department of mediamatics and cultural heritage in the duraspace registry. on january 10, 2023, in accordance with the methodology, we accessed the open registry duraspace25 and filtered the list of all dspace-based systems from that registry. the only filtering criterion was used technology (dspace) independent of the geolocation of the system, its content , information technology and libraries september 2023 dspace 7 benefits 6 formanek or the access method. as of that date, the duraspace registry displayed and exactly matched 3,138 entries that met the specified conditions. the registry arranged them row by row in a table format. unfortunately, it was not possible to set up a more sophisticated filter so that we only retrieved entries for a particular major version of dspace. furthermore, the entries could be browsed on the site by groups containing only 25 entries. thus, it was not possible to batch export all appropriate entries at once into some structured format like xlsx, csv, etc. we contacted lyrasis technical support by email with a request to export all entries as a single file, but support for this was not provided and the request was unmet. so, we laboriously copied all 3,138 entries from the web into a local ms excel spreadsheet, page by page—126 pages in total. then, we formatted and analyzed the structured data in ms excel. we focused on the last column of the spreadsheet which contained, in most cases but not always, information about the version of dspace that the system or repository was supposedly using. the results are summarized as follows: of the 3,138 registered dspace repositories, only 2,415 entries contained in their metadata the information about the specific dspace version, which represents 76.9 percent of the total number of registered systems. to avoid unwanted bias in the results as much as possible, we continued to work only with this set of repositories that had the used dspace version properly listed in their metadata entries. results are provided in table 2. table 2. dspace’s versions share in duraspace registry. dspace version number of registered systems share in % 1.x 709 29.36% 3.x 198 8.20% 4.x 313 12.96% 5.x 565 23.40% 6.x 613 25.38% 7.x 17 0.70% total 2,415 100.00% based on the provided data and as of january 2023, we observed that only 17 systems declared the usage of the latest major version of dspace 7, labeled as “dspace 7.x.” we suppose that there are certainly many outdated entries in the duraspace registry and elsewhere because this information is only submitted by administrators. lyrasis, as a service provider, does not proactively prompt administrators to continuously update the metadata. moreover, it is possible that many systems already exist but are not registered anywhere. further, many systems may have been—and certainly have been—upgraded to a higher version of dspace but the information on the upgrade has not been corrected in the registry. it is therefore probable that the provided data does not fully reflect the actual real state. in support of this claim, the disproportionately high share (almost 30 percent) of the oldest dspace version (i.e., 1.x) is noted in the total representation. if we completely ignore the probably highly overstated share of older dspace versions (and consider only versions 5.x and newer), the adjusted values are shown in table 3. information technology and libraries september 2023 dspace 7 benefits 7 formanek table 3. estimated share of specific dspace versions in duraspace registry. known dspace version number of registered systems share in % 5.x 565 47% 6.x 613 51% 7.x 17 2% total 1,195 100% based on the data mentioned and evaluated above, we can assume that dspace 7.x is used globally in only approximately 0.70 percent to 2 percent of cases, within the community of organizations using dspace. the most popular versions of dspace are 6.x or even the older 5.x by a considerable margin. in conclusion, we estimate that up to 47 percent of global digital repositories and libraries running dspace are using an outdated version (5.x or older) of this software. if we apply this percentage to data retrieved from the authoritative and more recent opendoar repository, we can reasonably assume that of the 2,379 registered repositories and digital libraries using dspace (see table 1), only around 16 to 48 systems are running on some of the latest major versions of dspace 7.x as of january 2023. most dspace systems—on the order of thousands—are already using the aging version 6.x or even the outdated version 5.x, which is alarming. numerically speaking, we estimate that there are over 2,000 open repositories and libraries based on dspace worldwide waiting to migrate to dspace 7, which is a significant number. we are aware that our results may not be completely accurate and reliable due to the highly probable outdated data in the repositories, but we consider them in kind as indicators or approximations when describing the state of dspace systems in the world at the beginning of 2023. a more rigorous and extensive survey in obtaining accurate numbers would be needed. from an information security perspective, the upgrade/migration to dspace 7.x will help to protect digital objects. the subsequent sections present an introduction to dspace 7 and discuss its principal advantages. dspace 7 development the dspace 7 project began after the open repositories 2015 conference. as a part of the dspace technology roadmap and the strategic plan of lyrasis, the developers, as well as the working groups formed during 2015–2017, focused on the new frontend user interface prototype based on angular as well as on the backend based on the rest api. these key components became the basis for the upcoming dspace 7 version.26 the release of the new major version of dspace 7 (i.e., 7.0) was announced on august 2, 2021, via a press statement also published on the lyrasis website. the statement indicated that dspace version 7.0 brings a number of reasonably radical changes (which will be discussed later in this paper) as reflected in the change to over 1 million lines of code inside the software.27 these changes are so significant in comparison to previous versions of dspace that it is not just only another evolutionary step in its development, but a change in the fundamental paradigms on which dspace is built. information technology and libraries september 2023 dspace 7 benefits 8 formanek atmire, a company that co-develops dspace with lyrasis, has made the first preview release of dspace 7.0 publicly available on its website for testing and feedback purposes since may 20, 2019. as atmire states, dspace 7 has been designed using test-driven development practices, aiming to detect and resolve potential issues at earlier stages of the development process, thereby ensuring more stable releases right from the outset.28 in a lyrasis press release announcing the completion of dspace 7.0, kristi park, dspace governance chair and texas digital library executive director, is quoted saying that the release of the final version of dspace 7.0 was preceded by five beta releases, as well as the subsequent participation of the developer community in a public event called dspace 7 testathon in april 2021. its goal was to report and mend bugs within the latest beta.29 consequently, the collaborative effort involving over 60 community developers, led by lyrasis, atmire, and 4science, led to the development of dspace 7.0. the project received funding support from over 25 institutions worldwide. 30 a more detailed explanation of dspace 7 benefits is offered in the following section. dspace 7.4 benefits the major version of dspace 7, according to the wide developer community, declares many radical changes as well as incorporating completely new features that not only improve the overall level of working with the system, but also add new functionalities and known bug fixes. there are several wiki-style articles published by atmire and lyrasis available on the internet, which draw the attention of system administrators to the most important changes made within dspace 7 and its internal software components.31 however, a detailed, scientifically prepared study focusing on a more comprehensive analysis and deployment of dspace 7, which could appeal to system administrators, has been lacking in the expert community so far. we believe that summarizing and reviewing the new functionalities and the most significant changes in dspace 7, as well as their impacts, will help administrators answer the main questions regarding the potential upgrade of their digital libraries and repositories to the newer major version of dspace 7. in the following paragraphs, we provide an overview analysis of selected major innovations made within dspace 7. all information in our study is based primarily on a synthesis of official information available within the lyrasis.org website. other important sources are technical articles; technically oriented blogs and wikis maintained by experts in the field, especially those written by tim donohue, technical lead for dspace at lyrasis; educational youtube channels by furrukh hussain zai; official dspace tutorials; and webinars published by institutions collaborating with lyrasis.32 we integrated, summarized, and synthesized the obtained information enriching it with our own dspace 7 testing experience. we consider the experience of installing dspace in your own environment (i.e., on site) to be nontransferable. we have actively used dspace since version 4. we also built from the ground up the digital repository of mediamatics and cultural heritage, which we have been maintaining since 2017 and is based on dspace 6.3. for the purposes of this paper, we downloaded and installed the then most up -to-date version of dspace 7.x, which was version 7.4, from october to december 2022, in parallel to the existing legacy repository still based on dspace 6.3. both systems run on the same environmental setup with two virtual machines running ubuntu server 22.04 lts, four cores, 4gb of ram. we were able to compare the two systems on the fly—systemically and architecturally in detail. lyrasis reports that the planned main benefits of dspace 7 have been divided from the very beginning of their phased roll-out into tiers. these tiers are priority lines that contained the information technology and libraries september 2023 dspace 7 benefits 9 formanek separated tasks and milestones according to their level of importance/necessity as identified in a survey conducted among the “dspace steering & leadership members and institutions they represent.” tier 1 represents the highest priority and tier 5 is reserved for low priority. eight out of nine features with the highest priority from tier 1 were implemented into versions 7.1, 7.2, and 7.3. the last available version of dspace, 7.4, adds new functionality from tier 2 and tier 3, so we consider the introduced changes as significant. 33 based on all these new features, it is possible to conclude that dspace 7.4 is already a sufficiently robust, stable, and feature-rich version from a system point of view, which we will describe in more detail. since the first release of dspace 7 (i.e., 7.0), several serious bugs have been fixed. we carry out a detailed comparative analysis and testing against the previous major version of dspace (6.x) based on the numerous implemented changes, and in the following paragraphs, we will present those that are most significant. for clarity of this study, we have divided the list of changes into separate groups: • changes at software system level • advanced integration with other external services • working with digital content • new security features and important bug fixes the release of dspace 7.5, which is a part of the roadmap, occurred in february 2023. it brought the next set of additional functionalities and enhancements within each tier. for example, the adding of item counts for communities and collections was planned. the next minor versions of dspace 7.x (7.6, 7.7, etc.) will be released as needed to ensure that all important milestones within tiers 1 through 4 will be met.34 changes at the software system level this section describes the most significant and important changes that have taken place directly inside dspace 7.x regarding system administration, its internal components, and the installation process. these changes are relevant to each system administrator who is involved in system installation, maintenance, or overall system administration. splitting the system into frontend and backend interfaces the most significant change made within dspace 7.x, and the longest in preparation, is the splitting of the system into a frontend part, based on angular technology, and a full-featured server-side, so called backend, with a completely new redesigned rest api.35 both these components are now installed separately, and as a result, they are linked to each other via references in configuration files.36 according to morris, it is possible to install and run each part (frontend and backend) on a separate server.37 this installation allows dspace to adapt more dynamically to the specific requirements and needs of a given environmental setup, or to scale the required performance more efficiently. the completely new user interface, or frontend, combines the advantages of both now nearly obsolete web interfaces known from dspace 6.x—xmlui and jspui. they no longer meet the required technical criteria for modern dynamic and responsive websites, especially xmlui, which relies on an old, unmaintained apache cocoon framework. the new angular technology used in dspace 7 extends the capabilities and helps improve the user experience: the web interface is fast, natively responsive, and easily customizable and themed via html and css.38 it better supports search engine optimization (seo), which means that inserted digital objects are “easily information technology and libraries september 2023 dspace 7 benefits 10 formanek indexed/searched by google or google scholar ... we verified this by running a series of tests with google scholar team.”39 other significant features of the angular-based frontend interface include improved accessibility and support for web archiving techniques (for example, internet archive harvesting). the web archiving process harvests, collects, and preserves digital content and artefacts from the internet (various web pages, media, blogs, forums, and other online sources) into a maintained archive. dspace 7 integrates with other services, which will be further discussed. angular as a technology, paradoxically, does not even require javascript support to run such applications. as donohue noted, in this case, each user gets “precompiled, static html pages each time they click links or buttons.”40 applications are written in typescript, an enhanced version of javascript supported by microsoft and google.41 due to the advantages, “all users should immediately migrate to and utilize the new angular user interface.” the disadvantage is that there is no migration path from either the xmlui or jspui to the new user interface.42 in other words, upgrading to dspace 7 requires creating and modifying the completely new web user interface, but this can be done by any administrator with slightly advanced knowledge of html and css. the angular-based frontend serves as an interactive graphical extension over the systemically more important backend, which represents the server-side of the dspace architecture. the backend’s most important part is the rest api. all functionalities available through the user interface are now equally achievable through this interface. the complex backend includes tools to support oai-pmh, sword, and rdf in addition to the rest api.43 the original rest api, used since dspace 4.0, was very limited in functionality due to read-only features. the updated and completely modernized rest api allows both read and write functionalities and provides “an easier path towards future integrations with dspace (by other third-party platforms or plugins).”44 in reviewing the rest api, we note that these radical changes have enabled integration with modern and widely deployed services for managing and making scientific content accessible. the old, outdated rest api did not support those features and it is, in short, necessary to update it to allow for further digital library system evolution, “following modern best practices, supporting modern technologies, and easier overall maintenance (maintenance).”45 new admin user interface and its features many changes were implemented in the graphical web interface during the redesign of dspace’s frontend. the main feature noted by users upon login is the new modern admin panel for system administrators, which was added to the interface in dspace 7.0, as shown in figure 3. administrators can run and monitor all workflow processes as well as self-created backend scripts and other actions related to system administration, including various curation tasks, through this so-called admin ui.46 the outputs of these actions are also visible through this graphic interface. administrators can now execute the filter-media script from the admin ui in order to immediately update thumbnails and full-text indexing. in dspace 6.x, this could only be done via the console or command line. 47 information technology and libraries september 2023 dspace 7 benefits 11 formanek figure 3. new admin panel. based on our own experience, it is also possible to edit or search the digital objects more intuitively through the admin ui than in any previous dspace versions. authors placed more emphasis on controlling the environment with the mouse. this is evident in the addition of well known drag-and-drop support or in the newly added so-called quick actions buttons. the authors noted that it is possible to withdraw items or make items private. the process of metadata editing has also been simplified. as donohue pointed out, it “introduces suggest-as-you-type for field name selection of new metadata.”48 in the case of problems that may occur during possible workflows, system administrators can log in as a specific person (type eperson in dspace), allowing them to “debug issues that a specific user is seeing, or do some work on behalf of that user.”49 from the administrator’s and user’s point of view, the control and management of the system is clearly more comfortable and convenient than it was compared to older versions of dspace. batch import and export from admin user interface as of version 7.4, dspace supports direct batch import and export of collections and items to and from a zip file through the admin ui. this workflow could only be done via the console in the past. information technology and libraries september 2023 dspace 7 benefits 12 formanek as a result, the migration process of collections or partial items between dspace systems has become faster. however, it is still not possible to migrate an entire site, or repository, through the web interface in this way. new content detection and analysis framework this feature is not completely new. older versions of dspace have been able to successfully extract textual content from items and index it for fast searching. since dspace 7.3, a new text and metadata extraction filter for full-text indexing has been implemented by using a software tool written in java—apache tika. the new filter is enabled by default and supports all the most popular text formats containing digital media, including adobe pdf, microsoft formats (word, excel, powerpoint), csv, html, rtf, txt, as well as all open document formats.50 new feedback form a new feedback form is now linked in the footer of every page. the system administrator defines “feedback.recipient” conveniently already in local dspace’s config file.51 improved gdpr alignment many system administrators remember the coming of the general data protection regulation (gdpr), especially in the european union, when they had to create and define their own user agreement on all web pages with some cookie preference popup window or banner. it should describe how their data is used through a privacy statement policy. dspace 7 natively integrates this functionality and, therefore, there is no need to create a banner from scratch. now, a cookie preference popup, as is shown in figure 4 below, immediately appears when first accessing the site. by default, it displays on the bottom bar, near the right corner. the system administrator can modify the wording and behavior of the popup. figure 4. a cookie preference popup window. welcome email the introduction email represents an additional advancement in the system. this elective feature has been accessible since version 7.3, allowing the transmission of a welcome email to all recently registered users. this capability can be activated by utilizing the recently introduced mail.welcome.enabled backend configuration in your local.cfg file.52 while a simple functionality, users may be pleased with it from the list of the new dspace 7 features. information technology and libraries september 2023 dspace 7 benefits 13 formanek easier installation process a more significant change within dspace 7.x is the simplification of its installation process in comparison to dspace 6.x. the ability to quickly install dspace 7 via container technology, such as docker, has been added, without requiring much technical knowledge and skills. however, donohue recommends this method only if you just want to try it out quickly for development or test environments.53 from our point of view, for production use cases, a traditional step-by-step installation using a manual is better. as we observed, the dspace installation process requires extra attention and experience with advanced operating system administration, such as windows or unix-like. we recommend following the installation guidelines, which are available thanks to a wide community of experienced users. the community has made available a number of valuable tutorials in text form as well as in video formats.54 we especially recommend the expert youtube channel called ready access to free library and information management technologies by furrukh hussain zai.55 vimal kumar, a digital libraries researcher, has contributed valuable work on the topic of digital libraries through his scientific papers as well as the dspace geek blog, which contains tutorials on this topic.56 we have also successfully used these resources to install a testing instance of dspace 7.4, so we consider these steps as helpful. installing dspace 7.x also requires a recent version of the database system (dbms). this is postgresql 14, at the time of writing.57 donohue provided a cautionary notice to readers regarding the deprecation of oracle database support, with its complete removal scheduled for mid-2023. consequently, all websites should diligently plan for a migration to postgresql.58 in addition to the database system, where the data is stored, a newer version of the tomcat web server (9.x) is also required because it fixes several critical security vulnerabilities. among the software prerequisites, the open jdk v11 and other software components (apache ant, maven, etc.—all are described in dspace’s manual) must be installed.59 compared to older dspace versions, there were no problems with the integration of the various software prerequisites mentioned above when installing dspace 7.x. based on our own experience, we state that webapp deployment is completely seamless within the tomcat 9.x configuration, even for less experienced users, in contrast to dspace 6.x, where we had to use the older tomcat 8 instead of the version 9 due to many persistent app deployment issues. on the other hand, dspace 7.x installation includes a few extra steps. donohue warns that apache solr is now excluded from the dspace installer and must be installed independently as a separate dependency in conjunction with the dspace backend.60 in dspace 6.x’s installer, it was already embedded. solr is one of many important software dependencies used within dspace. it is used in high -traffic websites, which enables features such as full-text searching and real-time indexing. to perform the testing, we installed solr 8.11.1 according to the available tutorials by kumar and system librarian and ran the service on default port nr. 8983 with no problems or errors, as shown in figure 5 below.61 we consider the solr installation simple, and it can be accomplished by even a moderately advanced dspace system administrator. information technology and libraries september 2023 dspace 7 benefits 14 formanek figure 5. solr service running on port 8983. advanced integration with other external services significant improvement had been made with the release of dspace 7.3 in its integration with other services. for this reason, we have decided to dedicate a separate section to describe these changes. the support for importing metadata via external apis, such as orcid, pubmed, sherpa journals or sherpa publishers among others, is considered the most significant and improved features. many integration features included in dspace 7.x are ported from dspace-cris—a separated project maintained by 4science. cris (current research information system) is an extension to dspace that provides a dedicated set of functionalities that give added value to researchers´ work and the repository. it is a picture of the current research activities undergoing in a research organization. the scope of a cris system is very broad as it covers people (researcher profiles, cvs, supervisions), research outcomes (publications, patents, research data), funds, projects, professional activities (invited presentations, peer or editorial reviews, etc.), or management information (metrics, bibliometrics, statistics). the content of a cris is analyzable and reportable. content within a cris system can be easily connected and linked with other content.62 as we see, dspace-cris aims to collect data about research activities and does so through its close integration with other services, which is also its biggest advantage. the dspace-cris is not covered in this paper, although it is also dspace-related. it represents a separate thread of dspace software development, so our focus remains on the relevant new features implemented in dspace 7.x. orcid authentication and synchronization as of dspace 7.3, enhanced orcid authentication and synchronization is implemented. each user with a dspace account, or eperson, can have the dspace researcher profile assigned in the system. it can be synchronized, or initially created, via orcid integration. this feature is turned off by default and must be enabled manually, if needed.63 although the integration with the orcid service has been supported by dspace for a long time, all advanced orcid-related features are fully available in dspace-cris. external lookup of digital object’s metadata while this interesting feature is planned, the functionality is not yet available. it is based on an integration with services providing persistent identifiers (e.g., doi) to enrich an in-progress submission.64 after the implementation, digital content can by directly imported from nine new external services including crossref, scopus, web of science, pubmed europe, cinii, nasa information technology and libraries september 2023 dspace 7 benefits 15 formanek astrophysics data system (ads), vufind.org, scielo.org, and the european patent office (epo). dspace’s official youtube channel also mentioned the support of arxiv in this way.65 support for openaire api the oai-pmh interface has undergone enhancements, allowing for the import of comprehensive funding information (including funder, funder identifier, funding stream, and funding id) from the openaire api when importing project entities (research projects). this functionality has been implemented through a novel external source utilizing the live import from external sources feature.66 single-sign-on (sso) using openid connect (oidc) authentication this entirely new authentication plugin has also been adapted from the dspace-cris project. it enables the support of authentication through various providers like google, microsoft, amazon, and others.67 support for remote handle resolver since dspace 7.4 it is possible to run handle server remotely on a dedicated system. this way may be preferred by some administrators. working with digital content another large group of improvements are changes related to working with digital content, specifically individual files and items. drag and drop support the main submission and workflow features have been redesigned into “a one-page submission process with a drag-and-drop interface & a searchable mydspace.”68 it is shown in figure 6. manual submission of digital objects is therefore slightly easier and faster for all users. figure 6. an item submission. moreover, starting from dspace version 7.4, there is now added support for markdown, html, and mathjax within item abstracts. additionally, all metadata fields now include support for line wrapping.69 information technology and libraries september 2023 dspace 7 benefits 16 formanek extended support for item-level versioning since dspace 7.1, it has been possible to create a new version of an item. versioning is a new functionality to build the history of a digital object. users have the ability to create new versions of an item whenever they make any modifications to it. item versioning is enabled by default.70 support for new configurable entities with relationships dspace 7 supports an optional feature through which relationships between different dspace “configurable entities” (e.g., items) can be captured and expressed, as shown in figure 7. these entities can have configurable relationships to other entities, for example, a set of journal hierarchy entities (journal, volume, issue, publication) and a set of research entities (publication, project, person, orgunit).71 figure 7. item relationships. since dspace 7.3 there has also been support for versioning of configurable entities. it is now possible to create versions of entities that automatically preserve all relationships. when configurable entities are enabled, in the “edit collection” page you can select an entity type (e.g., person, project, journal, etc.) that collection will accept. once configured, this collection will only accept new submissions of that entity type, and will be one of the recommended collections to submitters whenever they start a new submission of that entity type.72 entities can also be imported, including their relationships from another system via the simple archive format. item access status this is an optional feature that must be manually enabled via the new “showaccessstatuses” setting in the config file. subsequently, all item lists can display the status of the item (e.g., “open access,” “restricted,” “metadata only,” “embargoed”). redesigned search box currently, users possess the ability to conduct searches within the header of any web page by activating the magnifying glass icon. subsequently, the search results page has been augmented to incorporate automatic search highlighting, user-friendly drop-down and search filters, as well as the option of thumbnail-based results.73 these advancements are designed to optimize the search process, enabling users to promptly access pertinent information and resources. in comparison to earlier versions of dspace, dspace 7 now offers a more sophisticated and comprehensive search functionality, significantly enhancing its overall usability and accessibility for users. information technology and libraries september 2023 dspace 7 benefits 17 formanek support for video content streaming when enabled, dspace can now stream videos and view images in full screen, using a basic out-ofthe-box media viewer.74 downloading the video is no longer required and users can directly stream from the website, as shown in figure 8. figure 8. video streaming plugin activated in dspace 7. new security features and important bug fixes the last, but no less important, section reviews security aspects: fixing known bugs in the systems involved in running dspace 7 as well as adding some security features. apache log4j/log4shell security vulnerabilities dspace version 7.1.1 fixes a critical vulnerability in the apache log4j library (v2) that allows potential hackers to remotely execute commands and take full control over a system. the log4j library is used to create logs in many java frameworks and applications. it is also used within dspace 7.x. this vulnerability is known as cve-2021-44228.75 according to the national institute of standards and technology (nist), a part of the us department of commerce and one of the oldest physical science laboratories, this vulnerability received the highest possible base score of 10.0 (level: critical). this is therefore an extremely serious problem in terms of systems security.76 we must emphasize that this is a compelling information technology and libraries september 2023 dspace 7 benefits 18 formanek reason to update from older minor versions of dspace 7.x. it is necessary to verify that the server is not running the unpatched version dspace 7.1 in the backend, only if you have tried or worked with dspace 7.0. it is equally important that the server is running apache solr v8.11.1 (or above) as only this version of solr is patched for this vulnerability. fortunately, as donohue stated, dspace 6.x, 5.x, or 4.x (or below) are not vulnerable, as they all use log4j v1 exclusively with a default configuration which is not impacted.77 our recommendation is to exclude all dspace dependencies which relied on log4j v1 as well since log4j v1 is currently at end of life. the ideal solution is a complete migration to dspace 7.4 or above. password authentication security enhancements since dspace 7.4, all users must provide a current password when changing their password. password minimum requirements are now configurable, allowing administrators to require users to create more secure passwords.78 support for google recaptcha as of dspace 7.4, the new user registration form requires completing the google recaptcha feature, as show in figure 9. when activated, this feature provides a robust supplementary safeguard against potential undesired activities, such as large-scale registrations carried out by automated bots utilizing randomly generated or stolen email addresses.79 this feature confirms that the user is indeed a living being. figure 9. google recaptcha integration in dspace 7 web interface. other internal bug fixes and improvements it is highly recommended to always upgrade to the latest minor version (at the time of writing this was dspace 7.6), because the previous versions may contain various vulnerabilities. important improvements and fixed issues are highlighted: • dspace version 7.0 was vulnerable to cve-2021-41189. a community or collection admin could escalate their permissions to become a full administrator. • dspace 7.2 was vulnerable to cve-2022-22965. the vulnerability affects apache spring libraries which are used in java/jdk 9 or above. these libraries are again vulnerable to remote command execution. this vulnerability was scored 9.8 of 10.80 donohue recommends upgrading the dspace backend to version 7.2.1 immediately.81 optionally, information technology and libraries september 2023 dspace 7 benefits 19 formanek upgrade the apache tomcat to version 9.0.62, which provides extra guards against this vulnerability. • since dspace backend version 7.2, indexing and file download performance has been improved, which has contributed to an overall increase in the system experience. numerous other minor bug fixes or accessibility improvements have been implemented. the following section reviews key points of this process, which we faced during the experimental upgrade. the most significant improvements will be described briefly with details in the references. the upgrade of digital repository of mediamatics and cultural heritage—the case study a short history of the system upgrade proof of concept of the digital repository of mediamatics and cultural heritage was already tested during 2017, when it was built on the older version of dspace 6.1. later, in the first half of 2018, we completed the customization of the xmlui environment based on user and expert ux/ui testing. the redesigned web interface was completed and based on dspace 6.2. the system was subsequently launched into production. the repository already contained several dozens of digital objects at that time. it was upgraded to the version 6.3 until the end of 2018. all minor upgrades (6.1, 6.2, 6.3) were successfully carried out using the following procedure framework: 1. a new virtual server with the latest version of the underlying operating system available at the time was installed. we preferred to use ubuntu server with long-term support as a good choice. 2. all necessary software prerequisites and dependencies were installed in the new system environment as well as a default installation of dspace using the appropriate version available at the time according to the instructions in the manual. 3. all system settings, from local.cfg/dspace.cfg and many other configuration files, as well as the entire customized xmlui web interface were copied to the new system. 4. the digital objects were transferred to the empty repository by simply copying the original assetstore82 from the previous system to the new. the next necessary step in the migration process of the repository content was still the transfer of the postgresql database and its content. on the old system we performed a so-called database dump used for dspace; this dump was copied as a single file to the new environment and then restored via the entire restore procedure on the target server. 5. after restarting all system services, the repository started under the same domain name, or ip address, with the same url and the same digital content as it was in the previous repository. the above-mentioned procedure has always been our effort to run the system on the latest (updated) software components (such as apache tomcat, postgresql, java jdk, etc.) as well as on the latest operating system—although dspace, according to the manual, also allows upgrading the version on the remaining system. however, we have preferred a fresh install and then migration of the repository content and settings every time. the installation of new versions of dspace has been carried out without any major difficulties each time using the available manuals and procedures recommended by the experienced professionals, such as vimal kumar, known as system librarian, or furrukh hussain zai. however, the critical point in the mentioned procedure is the migration of content—entire digital objects—between the information technology and libraries september 2023 dspace 7 benefits 20 formanek two dspace instances. when it comes to digital content, it’s not just about the digital files (e.g., pdfs, etc.) located in the assetstore folder. an essential part of a digital object, as defined by the university of california, is also metadata (structural, descriptive, administrative). these metadata are stored in a sql database system (e.g., postgresql). need to upgrade to dspace 7.4 we have considered dspace 6.3 to be fully compliant and to meet our requirements for a long period of time. moreover, in the departmental environment, the web interfaces of the repository (xmlui as well as jpsui, as shown in fig. 10) have been optimized several times towards search engines, so a lot of work has been completed (translation to the slovak language, etc.). figure 10. translated xmlui web interface (on the left) and jspui (on the right). however, with the arrival and dynamic development of the new major version of dspace (7.x) during 2022, we have started to focus intensively on the idea of a necessary upgrade. the currently used underlying operating system (ubuntu server 20.04.5 lts) is becoming obsolete and will also need to be replaced soon. after a thorough information research and considering all the benefits, we decided to try to upgrade the repository to dspace 7.4 in november 2022, using the well-known procedure described above. we installed the latest ubuntu server 22.04.1 lts as well as dspace 7.4: the default backend (rest api) and the angular -based frontend. both components worked correctly in default settings but without any content. the customized settings as well as the whole assetstore were copied/migrated to dspace 7.4 without any issues. we also tried the procedure described in a related post by kumar (2022b). however, a critical problem occurred during the database migration. we assume that the migration from dspace 6.x to dspace 7.4 could not be easily performed using the previously described procedure via a dump and restore of the database, since the structure and content of the database are probably not fully compatible between these versions . or, we have made a mistake of which we were not aware. this issue (incompatibility) resulted in an internal server error message after the database restoration. the migrated digital objects remained inaccessible. thus, the dspace 7.4 instance remained unusable because of the database/system error. to resolve this problem and following the recommendations of the manual (donohue 2022e), we decided to run a migrate command over the imported database as well as the “indexdiscovery -b” command to upgrade the structure of the dspace database and subsequently reindex information technology and libraries september 2023 dspace 7 benefits 21 formanek the entire content of the site, so that the database became compatible with newer versions of dspace. however, this step did not solve the problem. after further unsuccessful attempts, we decided to look intensively for other ways on how to migrate the digital objects. we found that exporting and then importing the entire site structure of the dspace repository via archival information packages (aip) seemed to be potentially suitable and universally applicable. tim donohue recommended this as an alternative approach where you can use the aip export tools to export aips from your old site and then import them into the new site.83 the nature of aips and content migration via aips will be discussed in a following section. aip backup (export) aip is defined in the oais (open archival information system) reference model as an information package that is used to transmit archival objects into a digital archival system, store the objects within the system, and transmit objects from the system. an aip contains both metadata that describes the structure and content of an archived essence and the actual essence itself. it consists of multiple data files that hold either a logically or physically packaged entity.84 according to this definition, aip is used to archive or export digital objects within the dspace system. the ability to work with aip packages has been available in dspace since its inception (version 1.0), so this is not a new feature. therefore, we will utilize it in an appropriate way on an appropriate occasion. in the context of the dspace system, aip may include a single item, collection, community, or an entire site (site aips contain sitewide information). the files with content (such as pdfs), if they exist, are always part of an item’s aip. so, each aip is logically self-contained. this fact implies that it is possible to restore it without the rest of the archive. we can export or restore a single item, collection, or community. aips containing a larger logical object (site, collection, or community) do not include any physical child objects directly in itself (e.g., all items in collection or community) but contain only references (links) to all “child objects” in the metadata structure. if we want to successfully restore a collection, we need to do it only with the corresponding items’ aips. we propose to use the above-mentioned principles in this paper and export the entire site from the original system (dspace 6.3) and transfer this object (aip packages hierarchy) to the new environment (dspace 7.4) via the aip container(s), as shown in figure 11. figure 11. aip import and export process. old dspace 6.3 • aip backup (export) aip containter (hierarchy) • zip files new dspace 7.4 • aip restore (import) information technology and libraries september 2023 dspace 7 benefits 22 formanek each aip is represented fundamentally as a zip file containing metadata (expressed via mets manifest) and, in the case of an item, all related digital content (bitstreams, licenses, and any other associated computer files) too. when exporting an aip, only the latest version of the items is taken from the repository, which is considered as a slight disadvantage of using aip for export. thus, past versions of an item are always lost. unfinished submissions are not described in the aip either. the export command executed over the site root will also invoke the export of all integrated communities and collections. this gives a complete hierarchical export of the entire content of the digital repository including the access permissions to certain collections/communities, or information about the system users (called epersons with their dspace logins and passwords) and user groups. the single item export does not contain any mentioned administrative metadata. in the original system (dspace 6.3), we initiated the recursive command to export the entire hierarchical tree structure of the entire site, starting from its root defined in the command itself: sudo /dspace/bin/dspace packager -d -a -t aip -e spravca.repozitara@fhv.uniza.sk -i 1234/0 /tmp/kmkd/kmkd.zip where: • the prefix “sudo” escalates the privileges when running the command to the administrator level. • the string “/dspace/bin/dspace” represents the absolute path to the executable file located in the specific “dspace-home” directory. • the parameter “packager” triggers export or import of packages. • the “-d” switch modifies the packager settings to default export mode. • the “-a” switch extends the export to all child objects in the structure. this is an important setting if we want to export an entire site, collection, or community. • the “-t aip” switch defines the type of package to be used in the export. we need the aip type. • the “-e spravca.repozitara@fhv.uniza.sk” switch defines the specific user (dspace administrator) who has the privileges to run the export. • the “-i 1234/0” switch defines the top-level root entity from which the export starts. we want to export the entire site, so the root will always be in the form “your-handle-prefix/0”. in our case, we use “1234” as a handle prefix. • the string “/tmp/kmkd/kmkd.zip” represents the absolute path to the folder with the root zip file of the entire site (in our case). this zip file does not exist before the export. it is created when the command was performed. after the command is executed, according to the above-mentioned structure, the export process starts. depending on the number and size of (aip) packages, this action takes an estimation of minutes to tens of minutes. consequently, a hierarchical structure of internally related objects, i.e., zip files (aip archives), is created. as a result, the separate zip files are created side by side and arranged (e.g., by name) in the server’s file system (e.g., in the folder /tmp/kmkd/) after export. the relationships between the aip archives are stored at the metadata level in the associated xml file that is part of each zip archive. the structure of aip/zip packages is shown in figure 12. information technology and libraries september 2023 dspace 7 benefits 23 formanek figure 12. aip hierarchy example. the root package called “yoursite.zip” (“kmkd.zip” in our case) does not physically contain all other zip files of the entire site inside. it only references the aip packages of communities that are located one level lower in the hierarchy (see fig. 12) through xml metadata records. similarly, each community’s package refers to the package (or packages) of the collections that are placed in the hierarchy below it. the physical content files are only part of the item packages which also take up the most disk space. in our case, the export of all objects—284 zip files were created— required a disk space of about 13 gb and took approximately 10 minutes. aip backup (import) all zip files created in the /tmp/kmkd folder were then copied to a new virtual server based on ubuntu 22.04 lts, which already had been running a default installation of dspace 7.4. the new dspace instance already had a precreated administrator (i.e., eperson), under whose account we could start importing the transferred objects. for importing aip packages, we recommend using a command similar to the following: sudo /dspace/bin/dspace packager -r -a -f -t aip -e spravca.repozitara@fhv.uniza.sk -i 1234/0 -o skipifparentmissing=true /home/dkar/kmkd/kmkd.zip where: • “sudo /dspace/bin/dspace packager” has already been explained when exporting. • the “-r” switch initiates the restore process. yoursite.zip community @1234-1.zip collection@12342.zip item@1234-3.zip iitem@1234-4.zip item@1234-5.zip collection@12346.zip community @1234-7.zip collection@12348.zip item@1234-9.zip information technology and libraries september 2023 dspace 7 benefits 24 formanek • the “-f” switch forces so-called force restore. any conflicts with potentially existing data in the target site (if it previously existed) are resolved by overwriting. • the “-t aip” switch defines the package type as aip. • the “-e spravca.repozitara@fhv.uniza.sk” switch defines the administrator account that has the permission to perform the import. • the “-i 1234/0” switch says that we want to import the entire site (therefore that 0) characterized by a particular handle-prefix. we used an unregistered handle prefix “1234” in our case. • the “-o skipifparentmissing=true” switch defines additional import options, telling the packager not to deal with the parent object relationship. since we are already starting the import process from the site’s root (and the root object has no parent), this switch is mainly used when importing the entire site in one step. the above-mentioned command starts the import process over the site root. a hierarchical restore of the entire site will be performed by linking the aip packages at the metadata level. depending on the site size, this process takes tens of minutes to hours of time. in our case, the import process took about 25 minutes. thus, the import process is more time-consuming than the export. when the import process is completed, we highly recommend performing a comprehensive reindexing of the entire repository content. this crucial step is essential for ensuring the accurate visibility of all inserted metadata in the new system. reindexing plays a vital role in maintaining data accuracy, optimizing search engine performance, and ensuring the overall functionality of the system. by accurately reflecting all metadata and content in the new system, reindexing guarantees data consistency and integrity. additionally, it ensures that all data fields are appropriately indexed and searchable, effectively preventing any discrepancies between the migrated data and the search index. reindex the dspace repository content by executing the following command: sudo /dspace/bin/dspace index-discovery consequently, all digital objects should be accessible through the frontend (web) interface. when using an unregistered handle prefix (in our case only the string “1234”), based on our own experience, we must emphasize that the new dspace 7 configuration file (local.cfg) must already contain and define the same handle prefix. it must be identical with the prefix from the original system because the exported aip packages also contain the handle prefix from the original system. this prefix must be recognized by the new system. otherwise, the entire import will fail due to the inability of the system to translate the unknown handle prefix. short discussion on dspace system migration the default web interface (frontend) of our testing instance of dspace 7.4 is shown in figure 13. it also contains items (recent submissions), communities, and collections that we have imported through the aip restore process. information technology and libraries september 2023 dspace 7 benefits 25 formanek figure 13. screenshot of a testing instance in dspace 7.4.85 the complete migration from an older dspace system to a newer or latest version is much more complex and includes far more steps than is possible to describe in this article. in addition to the transfer of the digital objects themselves from one environment to another, system administrators will have to deal with the customization and overall optimization of the web graphical user interface as well as with the start-up and detailed setup of all other dspace components and services, in particular, enabling https in the production environment, setting up oai-pmh, and integrating orcid, which are not covered here due to the scope of the work.86 all these important steps must be kept in mind and considered during migration because their implementation is quite time-consuming. however, we still consider the transfer, i.e., the export and subsequent import of digital objects, to be the key moment of the whole migration. a digital repository without any content (even in the latest version and with advanced features enabled) is a worthless system. information technology and libraries september 2023 dspace 7 benefits 26 formanek conclusion taken together, we have made our own well-reasoned prediction that thousands of dspace systems around the world are likely to soon face the necessity of migrating to the newer version of dspace 7. this is a logical, but technically nontrivial, step that may confuse some system administrators, especially if an unexpected bug or state is detected and the administrator is unable to fix it themself. this paper provides helpful tips to institutions deciding on performing a critical system upgrade and further describes key points and reasons for the migration or upgrade. we have described the most important benefits and functionalities of the new dspace 7.4 version as well as briefly dealt with a case study, where we have given our proposals regarding the transfer of digital objects between two systems with different versions in an alternative, but more universal way, using a practical example. dspace 5 has been end-of-life since january 2023. the support for dspace 6 officially ended in the summer of 2023 as well. we urge system administrators to consider the importance of early upgrading or migrating the systems which they are responsible for due to this end in support for previous versions. administrators must consider these end-of-life statements and the question becomes: what happens if the repositories running dspace 6.x will not be updated by mid-2023? we do not expect a massive failure of open dspace-based repositories, but all systems built on outdated, obsolete, or unpatched software are at a significantly higher risk for potential exploitation of security vulnerabilities by attackers in the online environment. for further detailed research, it would be appreciated if the metadata fields in authoritative registry records like opendoar and others could be extended with a field tracking the specific version of the system used. we also suggest that authoritative registries send periodic communication (e.g., annually) to repository administrators as a prompt or reminder to update the relevant entries they contain. this would prevent outdated entries, links, and other discrepancies that may occur in registers. the upgrading to dspace 7 provides an undisputable number of new advanced and sophisticated features that simplify the overall work with digital entities. in general, dspace 7 represents a significant step forward in the development of software for open digital libraries and repositories on worldwide. it may be considered as a revolution, not only as an evolution in this field. as the most important changes are highlighted and briefly discussed in our study, we recommend trying the latest version of dspace 7, getting to know its new environment, and then developing your own upgrade plan. endnotes 1 kabir khan and arslan sheikh, “open source software adoption for development of institutional repositories in university libraries of islamabad,” information discovery and delivery 51, no. 1 (january 2023): 47–55, https://doi.org/10.1108/idd-10-2021-0113. 2 patrick mapulanga, “digitising library resources and building digital repositories in the university of malawi libraries,” the electronic library 31, no. 5 (september 2013), 635–47, https://doi.org/10.1108/el-02-2012-0019. 3 marios zervas et al., “next generation institutional repositories: the case of the cut institutional repository ktisis,” procedia computer science 146 (2019): 84–93. https://doi.org/10.1108/idd-10-2021-0113 https://doi.org/10.1108/el-02-2012-0019 information technology and libraries september 2023 dspace 7 benefits 27 formanek 4 nushrat khan, mike thelwall, and kayvan kousha, “are data repositories fettered? a survey of current ppractices, challenges and future technologies,” online information review 46, no. 3 (august 2021): 483–502, https://doi.org/10.1108/oir-04-2021-0204. 5 mapulanga, “digitising library resources.” 6 mackenzie smith et al., “dspace: an open source dynamic digital repository,” d-lib magazine 9 no. 1 (january 2003), http://www.mybestdocs.com/smith-m-etal-dspace.htm. 7 “research: technologies and services to ensure sustained access,” lyrasis, accessed january 11, 2023, https://www.lyrasis.org/membership/pages/research.aspx. 8 tim donohue, “dspaceresources,” last modified november 20, 2020, https://wiki.lyrasis.org/display/dspace/dspaceresources#dspaceresourcesdspacesystemdocumentation. 9 “organizational home,” lyrasis, accessed january 25, 2023, https://dspace.lyrasis.org/organizational-home/. 10 “opendoar statistics,” jisc, accessed january 10, 2023, https://v2.sherpa.ac.uk/view/repository_visualisations/1.html. 11 “registry of open access repositories,” eprints, accessed february 1, 2023, http://roar.eprints.org/cgi/roar_search/advanced?location_country=&software=dspace&type =&order=-recordcount%2f-date; “registry of research data repositories,” re3data, accessed january 15, 2023, https://doi.org/10.17616/r3d. 12 “duraspace registry,” lyrasis, accessed january 11, 2023, https://registry.lyrasis.org/?gv_search&filter_10=dspace&filter_4_6&filter_3&filter_20&filter_ 28&mode=all. 13 khan, thelwall and kousha, “are data repositories fettered?”; george pyrounakis, mara nikolaidou, and michael hatzopoulos, “building digital collections using open source digital repository software: a comparative study,” international journal of digital library systems 4, no. 1 (2014): 10–24, https://doi.org/10.4018/ijdls.2014010102. 14 dragan ivanović et al., “fairness of repositories & their data: a report from liber’s research data management working group” (2019), https://doi.org/10.5281/zenodo.3251593. 15 khan and sheikh, “open source software adoption”; nur ahammad, “open source digital library on open educational resources,” the electronic library 37, no. 6 (2019): 1022–39, https://doi.org/10.1108/el-11-2018-0225. 16 rushmanasab gurikar and gururaj s. hadagali, “use of open source software in indian institutional digital repositories: a study,” library philosophy and practice (2021): 4608, https://digitalcommons.unl.edu/libphilprac/4608. 17 tim donohue, “dspace releases,” last modified july 5, 2023, https://wiki.lyrasis.org/display/dspace/releases. https://doi.org/10.1108/oir-04-2021-0204 http://www.mybestdocs.com/smith-m-etal-dspace.htm https://www.lyrasis.org/membership/pages/research.aspx https://wiki.lyrasis.org/display/dspace/dspaceresources#dspaceresources-dspacesystemdocumentation https://wiki.lyrasis.org/display/dspace/dspaceresources#dspaceresources-dspacesystemdocumentation https://dspace.lyrasis.org/organizational-home/ https://v2.sherpa.ac.uk/view/repository_visualisations/1.html http://roar.eprints.org/cgi/roar_search/advanced?location_country=&software=dspace&type=&order=-recordcount%2f-date http://roar.eprints.org/cgi/roar_search/advanced?location_country=&software=dspace&type=&order=-recordcount%2f-date https://doi.org/10.17616/r3d https://registry.lyrasis.org/?gv_search&filter_10=dspace&filter_4_6&filter_3&filter_20&filter_28&mode=all https://registry.lyrasis.org/?gv_search&filter_10=dspace&filter_4_6&filter_3&filter_20&filter_28&mode=all https://doi.org/10.4018/ijdls.2014010102 https://doi.org/10.5281/zenodo.3251593 https://doi.org/10.1108/el-11-2018-0225 https://digitalcommons.unl.edu/libphilprac/4608 https://wiki.lyrasis.org/display/dspace/releases information technology and libraries september 2023 dspace 7 benefits 28 formanek 18 tim donohue, “support for dspace 5 and 6 is ending in 2023,” last modified march 24, 2023, https://wiki.lyrasis.org/display/dspace/support+for+dspace+5+and+6+is+ending+in+2023 . 19 zervas et al., “next generation institutional repositories.” 20 nason b. bimbe et al., “challenges in reinvigorating and upgrading dspace-based institutional repositories: a university of zambia (unza) library case study,” (ids working paper 483, march 2017), https://opendocs.ids.ac.uk/opendocs/handle/20.500.12413/12858. 21 teblos casados at al., “a quest to upgrade from a legacy to a modern open source repository,” (poster presented at 2019 texas conference on digital libraries, may 22, 2019), https://hdl.handle.net/2249.1/156391. 22 yogesh kawal and kumari vandana, “installation and customization of dspace: an institutional repository,” (paper, international virtual conference on contemporary problems and prospects of libraries and information centres, dindigul, tamilnadu, november 2021), https://www.researchgate.net/publication/360962968_installation_and_customization_of_ds pace_an_institutional_repository/citations. 23 zervas et al., “next generation institutional repositories.” 24 hope craft, “embracing open: institutional repository migrates to dspace 7,” iowa state university of science and technology, accessed january 11, 2023, https://www.lib.iastate.edu/news/embracing-open-institutional-repository-migrates-dspace7. 25 “duraspace registry,” lyrasis. 26 kristi searle, “the dspace 7 project—a simple summary,” lyrasis (april 2017), https://dspace.lyrasis.org/the-dspace-7-project-a-simple-summary/. 27 “dspace 7 press release,” lyrasis (2021), https://dspace.lyrasis.org/dspace-7-press-release/. 28 “available now: dspace 7 preview release,” atmire (may 2019), http://www.atmire.com/articles/detail/available-now-dspace-7-preview-release. 29 “dspace 7 press release,” lyrasis. 30 “dspace 7 press release,” lyrasis. 31 tim donohue, “release notes,” last modified jun 29, 2023, https://wiki.lyrasis.org/display/dsdoc7x/release+notes; “available now: dspace 7 preview release,” atmire. 32 “dspace 7.4 upgrading to dspace 7.4,” dspace videos (2022), https://www.youtube.com/watch?v=2ecza9zd5pm; “connect dspace 7.4 frontend with backend,” raflimts (january 2023), https://www.youtube.com/watch?v=gcyhhd2bf5y&list=pljvannulqgzzt4igyj03lrcl3bsmfvuu&index=2; “dspace 7.4 backend on ubuntu 22.04 with installation guide,” raflimts (december 2022), https://www.youtube.com/watch?v=n0wfsv4v1wm; “dspace 7.4 frontend on ubuntu 22.04 with installation guide,” raflimts (december 2022), https://wiki.lyrasis.org/display/dspace/support+for+dspace+5+and+6+is+ending+in+2023 https://opendocs.ids.ac.uk/opendocs/handle/20.500.12413/12858 https://hdl.handle.net/2249.1/156391 https://www.researchgate.net/publication/360962968_installation_and_customization_of_dspace_an_institutional_repository/citations https://www.researchgate.net/publication/360962968_installation_and_customization_of_dspace_an_institutional_repository/citations https://www.lib.iastate.edu/news/embracing-open-institutional-repository-migrates-dspace-7 https://www.lib.iastate.edu/news/embracing-open-institutional-repository-migrates-dspace-7 https://dspace.lyrasis.org/the-dspace-7-project-a-simple-summary/ https://dspace.lyrasis.org/dspace-7-press-release/ http://www.atmire.com/articles/detail/available-now-dspace-7-preview-release https://wiki.lyrasis.org/display/dsdoc7x/release+notes https://www.youtube.com/watch?v=2ecza9zd5pm https://www.youtube.com/watch?v=gcyhhd2bf5y&list=pljvannulqgzzt4igyj03lrcl-3bsmfvuu&index=2 https://www.youtube.com/watch?v=gcyhhd2bf5y&list=pljvannulqgzzt4igyj03lrcl-3bsmfvuu&index=2 https://www.youtube.com/watch?v=n0wfsv4v1wm information technology and libraries september 2023 dspace 7 benefits 29 formanek https://www.youtube.com/watch?v=fh4kxck4two; “video recording from the webinar: dspace 7 and dspace 7 cris: how to bring your institutional repository to the next level?,” pcg academia (july 2022), https://pcgacademia.pl/en/news/video-recording-from-thewebinar-dspace-7-and-dspace-7-cris-how-to-bring-your-institutional-repository-to-the-nextlevel/. 33 ivan masár, “dspace release 7.0 status” (september 2022), https://wiki.lyrasis.org/display/dspace/dspace+release+7.0+status#dspacerelease7.0stat us-whatfeaturesarecominginalater7.xrelease?7.x. 34 masár, “dspace release 7.0 status.” 35 “dspace 7.4 upgrading to dspace 7.4,” dspace videos. 36 “connect dspace 7.4 frontend with backend,” raflimts. 37 carol minton morris, “dspace 7 ui project plain language summary” (july 2017), https://wiki.lyrasis.org/display/dspace/dspace+7+ui+project+plain+language+summary. 38donohue, “release notes.” 39 morris, “dspace 7 ui project plain language summary.” 40 donohue, “release notes.” 41 morris, “dspace 7 ui project plain language summary.” 42 donohue, “release notes.” 43 donohue, “release notes.” 44 morris, “dspace 7 ui project plain language summary.” 45 morris, “dspace 7 ui project plain language summary.” 46 “dspace 7.4 upgrading to dspace 7.4,” dspace videos. 47 donohue, “release notes.” 48 donohue, “release notes.” 49 donohue, “release notes.” 50 donohue, “release notes.” 51 donohue, “release notes.” 52 donohue, “release notes.” 53 tim donohue, “try out dspace 7,” last modified april 2023, https://wiki.lyrasis.org/display/dspace/try+out+dspace+7. https://www.youtube.com/watch?v=fh4kxck4two https://pcgacademia.pl/en/news/video-recording-from-the-webinar-dspace-7-and-dspace-7-cris-how-to-bring-your-institutional-repository-to-the-next-level/ https://pcgacademia.pl/en/news/video-recording-from-the-webinar-dspace-7-and-dspace-7-cris-how-to-bring-your-institutional-repository-to-the-next-level/ https://pcgacademia.pl/en/news/video-recording-from-the-webinar-dspace-7-and-dspace-7-cris-how-to-bring-your-institutional-repository-to-the-next-level/ https://wiki.lyrasis.org/display/dspace/dspace+release+7.0+status#dspacerelease7.0status-whatfeaturesarecominginalater7.xrelease?7.x https://wiki.lyrasis.org/display/dspace/dspace+release+7.0+status#dspacerelease7.0status-whatfeaturesarecominginalater7.xrelease?7.x https://wiki.lyrasis.org/display/dspace/dspace+7+ui+project+plain+language+summary https://wiki.lyrasis.org/display/dspace/try+out+dspace+7 information technology and libraries september 2023 dspace 7 benefits 30 formanek 54 “install dspace 7.2 on ubuntu 22.04,” system librarian, accessed november 28, 2022, https://hyperlink.co.ke/2022/06/17/install-dspace-7-2-on-ubuntu-22-04/; vimal kumar, dspace geek (blog), 2022, http://dspacegeek.blogspot.com/; jeremiah kellogg, “install dspace 7 backend and frontend on a virtualbox debian 11 server,” february 2022, https://www.youtube.com/watch?v=ta-bpqbfjsg; furrukh hussain zai, “dspace,” accessed october 28, 2022, https://www.youtube.com/playlist?list=pljvannulqgzxbjkxbbh3kvtgk6swveaxb. 55 “dspace 7.4 backend on ubuntu 22.04 with installation guide,” raflimts; “dspace 7.4 frontend on ubuntu 22.04 with installation guide,” raflimts. 56 kumar, dspace geek. 57 kellogg, “install dspace 7 backend and frontend on a virtualbox debian 11 server.” 58 donohue, “release notes.” 59 “dspace 7.4 backend on ubuntu 22.04 with installation guide,” raflimts. 60 donohue, “release notes.” 61 kumar, dspace geek; vimal kumar, “upgrading dspace 6 to 7” (blog post), december 2022, http://dspacegeek.blogspot.com/2022/12/upgrading-dspace-6-to-7.html; “install dspace 7.2 on ubuntu 22.04,” system librarian. 62 zervas et al., “next generation institutional repositories.” 63 “dspace 7.4 upgrading to dspace 7.4,” dspace videos. 64 masár, “dspace release 7.0 status.” 65 donohue, “release notes.” 66 donohue, “release notes.” 67 donohue, “release notes.” 68 masár, “dspace release 7.0 status.” 69 donohue, “release notes.“ 70 donohue, “release notes.” 71 donohue, “release notes.” 72 donohue, “release notes.” 73 donohue, “release notes.” 74 tamás kiss and nagy zoltán kanász, “rekordoldali médiamegjelenítés a dspace 7-ben,” digitális bölcsészet 17, no. 3 (december 2020), https://doi.org/10.31400/dh-hun.2020.3.1475. https://hyperlink.co.ke/2022/06/17/install-dspace-7-2-on-ubuntu-22-04/ http://dspacegeek.blogspot.com/ https://www.youtube.com/watch?v=ta-bpqbfjsg https://www.youtube.com/playlist?list=pljvannulqgzxbjkxbbh3kvtgk6swveaxb http://dspacegeek.blogspot.com/2022/12/upgrading-dspace-6-to-7.html https://doi.org/10.31400/dh-hun.2020.3.1475 information technology and libraries september 2023 dspace 7 benefits 31 formanek 75 this vulnerability is described in more detail at https://cve.mitre.org/cgibin/cvename.cgi?name=cve-2021-44228 and https://logging.apache.org/log4j/2.x/security.html#fixed_in_log4j_2.15.0 . 76 “cve-2021-44228 detail,” nist, accessed december 19, 2022, https://nvd.nist.gov/vuln/detail/cve-2021-44228. 77 tim donohue, “dspace & log4j critical vulnerabilities (cve-2021-44228 and cve-201917571),” accessed november 29, 2022, https://groups.google.com/g/dspacetech/c/qr59bs4nit0. 78 donohue, “release notes.” 79 donohue, “release notes.” 80 “cve-2022-22965 detail,” nist, accessed december 19, 2022, https://nvd.nist.gov/vuln/detail/cve-2022-22965. 81 donohue, “release notes.” 82 the assetstore folder is a part of the dspace storage layer—a specific folder on the filesystem that contains the uploaded, ingested, or generated files (documents, images, audio, video, datasets, etc.) according to tim donohue, “storage layer,” last modified january 4, 2019, https://wiki.lyrasis.org/display/dsdoc6x/storage+layer. 83 tim donohue, “migrating dspace to a new server,” accessed december 16, 2022, https://wiki.lyrasis.org/display/dsdoc7x/migrating+dspace+to+a+new+server. 84 “archival information package (aip),” iasa, accessed january 25, 2023, https://www.iasaweb.org/tc04/archival-information-package-aip. 85available online at: https://repozitar.fhv.uniza.sk. 86 “dspace 7.4 upgrading to dspace 7.4,” dspace videos. https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2021-44228 https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2021-44228 https://logging.apache.org/log4j/2.x/security.html#fixed_in_log4j_2.15.0 https://nvd.nist.gov/vuln/detail/cve-2021-44228 https://groups.google.com/g/dspace-tech/c/qr59bs4nit0 https://groups.google.com/g/dspace-tech/c/qr59bs4nit0 https://nvd.nist.gov/vuln/detail/cve-2022-22965 https://wiki.lyrasis.org/display/dsdoc6x/storage+layer https://wiki.lyrasis.org/display/dsdoc7x/migrating+dspace+to+a+new+server https://www.iasa-web.org/tc04/archival-information-package-aip https://www.iasa-web.org/tc04/archival-information-package-aip https://repozitar.fhv.uniza.sk/ abstract institutional repositories and dspace dspace end of support for older dspace releases dspace migration and upgrade: case studies in current literature number of dspace-based repositories running worldwide dspace 7 development dspace 7.4 benefits changes at the software system level splitting the system into frontend and backend interfaces new admin user interface and its features batch import and export from admin user interface new content detection and analysis framework new feedback form improved gdpr alignment welcome email easier installation process advanced integration with other external services orcid authentication and synchronization external lookup of digital object’s metadata support for openaire api single-sign-on (sso) using openid connect (oidc) authentication support for remote handle resolver working with digital content drag and drop support extended support for item-level versioning support for new configurable entities with relationships item access status redesigned search box support for video content streaming new security features and important bug fixes apache log4j/log4shell security vulnerabilities password authentication security enhancements support for google recaptcha other internal bug fixes and improvements the upgrade of digital repository of mediamatics and cultural heritage—the case study a short history of the system upgrade need to upgrade to dspace 7.4 aip backup (export) aip backup (import) short discussion on dspace system migration conclusion endnotes 2 information technology and libraries | march 2011 program that would provide for educating mentees about lita, sharing of areas of expertise and awareness, and develop a network of professionals. dialogue on the lita electronic discussion list and conversations with committee and interest group chairs suggests a desire and need for leadership training. the membership development committee is addressing the need for mentors in lita 101 and lita 201 held at ala annual conferences and midwinter meetings. lita leadership, including the membership development committee, committee and interest group chairs, the education committee, lita emerging leaders, and others, will be included in an ongoing dialogue to see how and what can be implemented from the lita leadership institute and the lita mentorship program recommendations as submitted by the 2009 emerging leaders team t. follow-up by lita to implement the recommendations of emerging leader projects is important to the vitality and longevity of the association. since 2007, a number of projects have been developed by emerging leaders. information about the projects is available at the following locations online: ■■ the ala website: http://www.ala.org/ala/educationcareerleader ship/emergingleaders/index.cfm ■■ ala connect: http://connect.ala.org/emergingleaders ■■ facebook: http://www.facebook.com/pages/ala-emerging -leaders/156736295251?ref=ts/ ■■ the emerging leaders blog: http://connect.ala.org/2011emergingleaders ■■ the emerging leaders wiki: http://emergingleaders.ala.org/wiki/index.php ?title =main_page i n 2006, ala president leslie burger implemented six initiatives, including an emerging leaders program that is now in its fifth year. the initiative was designed to prepare librarians who are new to the profession in leadership skills that are applicable on the job and as active leaders within the association. lita is sponsoring 2011 emerging leaders bohyun kim and andreas orphanides. bohyun is currently digital access librarian at the florida international university medical library. andreas is currently librarian for digital technologies and learning at the north carolina state university libraries. as of the writing of this column, the projects for 2011 have not been assigned. additional lita members accepted into the 2011 ala emerging leaders program include tabatha farney, deana greenfield, amanda harlan, colleen harris, megan hodge, matthew jabaily, catherine kosturski, nicole pagowsky, casey schacher, sibyl schaefer, jessica sender, and andromeda yelton. lita provides an ideal environment for its members to enhance their skills. in 2009, emerging leaders team t developed a project “making it personal: leadership development programs for lita,” working in consultation with the lita membership development committee. team members included amanda hornby (university of washington), angelica guerrero fortin (san diego county library), dan overfield (cuyahoga community college), and lisa carlucci thomas (yale university). the team t members recommended the creation of “an online continuing education program to develop the leadership and project management skills necessary to maintain and promote the value and ability of lita’s professional membership to the greater librarian population.” outcomes for the training would include project-management and team-building skills within a context that focuses on the development and application of technology in libraries. the team members also recommended the establishing of a lita mentorship karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: membership, leadership, emerging leaders, and lita librarians and technology skill acquisition: issues and perspectives | farney 141click analytics: visualizing website use data | farney 141 tutorial tabatha a. farney librarians who create website content should have access to website usage statistics to measure their webpages’ effectiveness and refine the pages as necessary.3 with web analytics libraries can increase the effectiveness of their websites, and as marshall breeding has observed, libraries can regularly use website statistics to determine how new webpage content is actually being used and make revisions to the content based on this information.4 several recent studies used google analytics to collect and report website usage statistics to measure website effectiveness and improve their usability.5 while web analytics are useful in a website redesign process, several studies concluded that web usage statistics should not be the sole source of information used to evaluate a website. these studies recommend using click data in conjunction with other website usability testing methods.6 background a lack of research on the use of click analytics in libraries motivated the web services librarian to explore their potential by directly implementing them on the library’s website. she found that there are several click analytics products available and each has its own unique functionality. however, many are commercially produced and expensive. with limited funding, the web services librarian selected google analytics’ in-page analytics, clickheat, and crazy egg because they are either free or inexpensive. each tool was evaluated on the library’s website for over a six month period. because google analytics cannot discern between the same link repeated in multiple places on a webpage. furthermore, she wanted to use website use data to determine the areas of high and low usage on the library’s homepage, and use this information to justify her webpage reorganization decisions. although this data can be found in a google analytics report, the web services librarian found it difficult to easily identify the necessary information within the massive amount of data the reports contain. the web services librarian opted to use click analytics, also known as click density analysis or site overlay, a subset of web analytics that reveals where users click on a webpage.1 a click analytics report produces a visual representation of what and where visitors are clicking on an individual webpage by overlaying the click data on top of the webpage that is being tested. rather than wading through the data, libraries can quickly identify what content users are clicking by using a click analytics report. the web services librarian tested several click analytics products while reassessing the library’s homepage. during this process she discovered that each click analytics tool had different functionalities that impacted their usefulness to the library. this paper introduces and evaluates three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, in the context of redesigning the library’s homepage and discusses the benefits and drawbacks of each. literature review library literature indicates that libraries are actively engaged in interpreting website usage data for a variety of purposes. laura b. cohen’s study encourages libraries to use their website usage data to enhance their understanding of how visitors access and use library websites.2 jeanie m. welch further recommends that all click analytics: visualizing website use data editor’s note: this paper is adapted from a presentation given at the 2010 lita forum click analytics is a powerful technique that displays what and where users are clicking on a webpage helping libraries to easily identify areas of high and low usage on a page without having to decipher website use data sets. click analytics is a subset of web analytics, but there is little research that discusses its potential uses for libraries. this paper introduces three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, and evaluates their usefulness in the context of redesigning a library’s homepage. w eb analytics tools, such as google analytics, assist libraries in interpreting their website usage statistics by formatting that data into reports and charts. the web services librarian at the kraemer family library at the university of colorado, colorado springs wanted to use website use data to reassess the library’s homepage that was crowded with redundant links. for example, all the links in the site’s dropdown navigation were repeated at the bottom of the homepage to make the links more noticeable to the user, but it unintentionally made the page long. to determine which links the web services librarian would recommend for removal, she needed to compare the use or clicks the repetitive links received. at the time, the library relied solely on google analytics to interpret website use data. however, this practice proved insufficient tabatha a. farney (tfarney@uccs.edu) is web services librarian, kraemer family library, university of colorado, colorado springs, colorado. 142 information technology and libraries | september 2011 libraries, outbound links include library catalogs or subscription databases. additional javascript tags must be added to each outbound link for google analytics to track that data.9 once google analytics recognizes the outbound links, their click data will be available in the in-page analytics report. visitors to that page, and outbound destinations, links that navigate visitors away from that webpage. the inbound sources and outbound destinations reports can track outbound links, which are links that have a different domain or url address from the website tracked within google analytics. for in-page analytics google analytics is a popular, comprehensive web analytics tool that contains a click analytics feature called in-page analytics (formerly site overlay) that visually displays click data by overlaying that information on the current webpage (see figure 1). site overlay was used during the library’s redesign process, however, it was replaced by in-page analytics in october 2010.7 the web services librarian reassessed the library’s homepage using in-page analytics, and found that the current tool resolved some of site overlay’s shortcomings. site overlay is no longer accessible in google analytics, so this paper will discuss in-page analytics. essentially, in-page analytics is an updated version of the site overlay (see figure 2). in addition to visually representing click data on a webpage, in-page analytics contains new features including the ability to easily segment data. web analytics expert, avinash kaushik, stresses the importance of segmenting website use data because it breaks down the aggregated data into specific data sets that represents more defined groups of users.8 rather than studying the total number of clicks a link received, an in-page analytics report can segment the data into specific groups of users, such as mobile device users. in-page analytics provides several default segments, but custom segments can also be applied allowing libraries to further filter the data that is constructive to them. in-page analytics also displays a complementing overview report of statistics located in a side panel next to the typical site overlay view. this overview report extracts useful data from other reports generated in google analytics without having to leave the in-page analytics report screen. the report includes the webpage’s inbound sources, also called top referrals, which are links from other webpages leading figure 1. screenshot of google analytics’ defunct site overlay figure 2. screenshot of google analytic’s in-page analytic librarians and technology skill acquisition: issues and perspectives | farney 143click analytics: visualizing website use data | farney 143 services librarian uses a screen capture tool, such as the firefox add-on screengrab13, to collect and archive the in-page analytics reports, but the process is clunky and results in the loss of the ability to segment the data. clickheat labsmedia’s clickheat is an open source heat mapping tool that visually displays the clicks on a webpage using color to indicate the amount of clicks an area receives. similar to in-page analytics, a clickheat heat map displays the current webpage and overlays that page with click data (see figure 3). instead of listing percentages or actual numbers of clicks, the heat map represents clicks using color. the warmer the color, such as yellows, oranges, or reds, the more clicks that area receives; the absence of color implies little to no click activity. each heat map has an indicator that outlines the number of clicks a color represents. a heat map clearly displays the heavily used and underused sections on a webpage making it easy for people with little experience interpreting website usage statistics to interpret the data. however, a heat map is not about exact numbers, but rather general areas of usage. for exact numbers, a traditional, comprehensive web analytics tool is required. clickheat can stand alone or be integrated into other web analytic tools.14 to have a more comprehensive web analytics product, the web services librarian opted to use the clickheat plugin for piwik, a free, open source web analytics tool that seeks to be an alternative to google analytics.15 by itself piwik has no click analytics feature, therefore clickheat is a useful plugin. both piwik and clickheat require access to a web server for installation and knowledge of php and mysql to configure them. because the kraemer family library does not maintain its own web servers, the pages, but it is time consuming and may not be worth the effort since the data are indirectly available.11 a major drawback to in-page analytics is that it does not discern between the same links listed in multiple places on a webpage. instead it tracks redundant links as one link, making it impossible to distinguish which repeated link received more use on the library’s homepage. similarly, the library’s homepage uses icons to help draw attention to certain links. these icons are linked images next to their counterpart text link. since the icon and text link share the same url, in-page analytics cannot reveal which is receiving more clicks. in-page analytics is useless for comparing repetitive links on a webpage, but google reports that they are working on adding this capability.12 as stated earlier, in-page analytics lays the click data over the current webpage in real-time, which can be both useful and limiting. using the current webpage allows libraries to navigate through their site while staying within the in-page analytics report. libraries can follow in the tracks of website users to learn how they interact with the site’s content and navigation. the downside is that it is difficult to compare a new version of a webpage with an older version since it only displays the current webpage. for example, the web services librarian could not accurately compare the use data between the old homepage and the revised homepage within the in-page analytics report because the newly redesigned homepage replaced the old page. comparing different versions of a webpage could help determine whether the new revisions improved the page or not. an archive or export feature would remedy this problem, but in-page analytics does not have this capacity. additionally, an export function would improve the ability to share this report with other librarians without having them login to the google analytics website. currently, the web evaluation of in-page analytics in-page analytics’ advanced segmenting ability far exceeds the old site overlay functionality. segmenting click data at the link level helps web managers to see how groups of users are navigating through a website. for example, in-page analytics can monitor the links mobile users are clicking, allowing web managers to track how that group of users are navigating through a website. this data could be used in designing a mobile version of a site. in-page analytics integrates a site overlay report and an overview report that contains selected web use statistics for an individual webpage. although the overview report is not in visual context with the site overlay view, it combines the necessary data to determine how a webpage is being accessed and used. this assists in identifying possible flaws in a website’s navigation, layout, or content. it also has the potential to clarify misleading website statistics. for instance, google analytics top exit pages report indicates the library’s homepage is the top exit page for the site. exit pages are the last page a visitor views before leaving the site.10 having a high exit rate could imply visitors were leaving the library’s site from the homepage and potentially missing a majority of the library’s online resources. using in-page analytics, it was apparent the library’s homepage had a high number of exits because many visitors clicked on outbound links, such as the library catalog, that navigated visitors away from the library’s website. rather than finding a potential problem, in-page analytics indicated that the homepage’s layout successfully led visitors to a desired point of information. while the data from the outbound links is available in the data overview report, it is not displayed within the site overlay view. it is possible to work around this problem by creating internal redirect 144 information technology and libraries | september 2011 the precise number of clicks is available in traditional web analytics reports. installing and configuring clickheat is a potential drawback for some libraries that do not have access to the necessary technology or staff to maintain it. even with access to a web server and knowledgeable staff, the web services librarian still experienced glitches implementing clickheat. she could not add clickheat to any high trafficked webpage because it created a slight, but noticeable, lag in response time to any page it was added. the cause was an out-of-box configuration setting that had to be fixed by the campus’ information technology department.17 another concern for libraries is that clickheat is continuously being developed with new versions or patches released periodically.18 like any locally installed software, libraries must plan for continuing maintenance of clickheat to keep it current. just as with in-page analytics, clickheat has no export or archive function. this impedes the web main navigation on the homepage and opted to use links prominently displayed within the homepage’s content. this indicated that either the users did not notice the main navigation dropdown menus or that they chose to ignore them. further usability testing of the main navigation is necessary to better understand why users do not utilize it. clickheat is most useful when combined with a comprehensive web analytics tool, such as piwik. since clickheat only collects data where visitors are clicking, it does not track other web analytics metrics, which limits its ability to segment the click data. currently, clickheat only segments clicks by browser type or screen resolution. additional segmenting ability would enhance this tool’s usefulness. for example, the ability to segment clicks from new visitors and returning visitors may reveal how visitors learn to use the library’s homepage. furthermore, the heat map report does not provide the actual number of clicks on individual links or content areas since heat maps generalize click patterns. web services librarian worked with the campus’ information technology department to install piwik with the clickheat plugin on a campus web server. once installed, piwik and clickheat generate javascript tags that must be added to every page that website use data will be tracked. although piwik and clickheat can be integrated, the tools work separately so two javascript tags must be added to a webpage to track click data in piwik as well as in clickheat. only the pages that contain the clickheat tracking script will generate heat maps that are then stored within the local piwik interface. evaluation of clickheat in-page analytics only tracks links or items that perform some sort of action, such as playing a flash video,16 but clickheat tracks clicks on internal links, outbound links, and even nonlinked objects, such as images. hence, clickheat is able to track clicks on the entire webpage. tracking non-linked objects was unexpectedly useful in identifying potential flaws in a webpage’s design. for instance, within a week of beta testing the library’s redesigned homepage, it was evident that users clicked on the graphics that were positioned closely to text links. the images were intended to draw the user’s attention to the text link, but instead users clicked on the graphic itself expecting it to be a link. to alleviate possible user frustration, the web services librarian added links to the graphics to take visitors to the same destinations as their companion text links. clickheat treats every link or image as its own separate component, so it has the ability to compare the same link listed in multiple places on the same page. unlike in-page analytics, clickheat was particularly helpful in analyzing which redundant links received more use on the homepage. in addition, the heat map also revealed that users ignored the site’s figure 3. screenshot of clickheat’s heat map report librarians and technology skill acquisition: issues and perspectives | farney 145click analytics: visualizing website use data | farney 145 clicks that area has received with the brighter colors representing the higher percentage of clicks. the plus signs can be expanded to show the total number of clicks an item has received, and this number can be easily filtered into eleven predefined allowing crazy egg to differentiate between the same link or image listed multiple times on a webpage. crazy egg displays this data in color-coded plus signs which are located next to the link or graphic it represents. the color is based on the percentage of services librarian’s ability to share the heat maps and compare different versions of a webpage. again, the web services librarian manually archives the heat maps using a screen capture tool, but the process is not the perfect solution. crazy egg crazy egg is a commercial, hosted click analytics tool selected for this project primarily for its advanced click tracking functionality. it is a fee-based service that requires a monthly subscription. there are several subscription packages based on the number of visits and “snapshots.” snapshots are webpages that are tracked by crazy egg. the kraemer family library subscribes to the standard package that allows up to twenty snapshots at one time with a combined total of 25,000 visits a month. to help manage how those visits are distributed, each tracked page can be assigned a specific number of visits or time period so that one webpage does not use all the visits early in the month. once a snapshot reaches its target number of visits or its allocated time period, it automatically stops tracking clicks and archives that snapshot within the crazy egg website.19 the snapshots convert the click data into three different click analytic reports: heat map, site overlay, and something called “confetti view.” crazy egg’s heat map report is comparable to clickheat’s heat map; they both use intensity of colors to show high areas of clicks on a webpage (see figure 4). crazy egg’s site overlay is similar to in-page analytics in that they both display the number of clicks a link receives (see figure 5). unlike in-page analytics, crazy egg tracks all clicks including outbound links as well as nonlinked content, such as graphics, if it has received multiple clicks. every clicked link and graphic is treated as its own separate entity, figure 4. screenshot of crazy egg’s heat map report figure 5. screenshot of crazy egg’s site overlay report 146 information technology and libraries | september 2011 to decide which redundant links to remove from the homepage. the confetti view report was useful for studying clicks on the entire webpage. segmenting this data allowed the web services librarian to identify click patterns on the webpage from a specific group. for example, the report revealed that mobile device users would scroll horizontally on the homepage to click on content, but rarely vertically. she also focused on the time to click segment, which reports how long it took a visitor to click on something, in the confetti view to identify links or areas that took users over half a minute to click. both segments provided interesting information, but further usability testing is necessary to better understand why mobile users preferred not to scroll vertically or why it took users longer to click on certain links. crazy egg also has the ability to archive its snapshots within its profile. this is useful for comparing different versions of a webpage to discover if the modifications were an improvement or not. one goal for the library’s homepage redesign was to shorten the page so users did not have to scroll evaluation of crazy egg crazy egg combines the capabilities of in-page analytcis and clickheat in one tool and expands on their abilities. it is not a comprehensive web analytics tool like google analytics or piwik, but rather is designed to specifically track where users are clicking. crazy egg’s heat map report is comparable to the one freely available in clickheat, however, its site overlay and confetti view reports are more sophisticated than what is currently available for free. the web services librarian found crazy egg to be a worthwhile investment during the library’s homepage redesign because it provided additional context to show how users were interacting with the library’s website. the site overlay facilitated the ability to compare the same link listed in multiple locations on the library’s homepage. not only could the web services librarian see how many clicks the links received, but she could also segment and compare that data to learn which links users were finding faster and which links new visitors or returning visitors preferred. this data helped her segments that include day of week, browser type, and top referring websites. custom segments may be applied if they are set up within the crazy egg profile. the confetti view report displays every click the snapshot recorded and overlays those clicks as colored dots on the snapshot as shown in figure 6. the color of the dot corresponds to specific segment value. the confetti view report uses the same default segmented values used in the site overlay report but here they can be further filtered into defined values for that segment. for example, the confetti view can segment the clicks by window width and then further filter the data to display only the clicks from visitors with window widths under 1000 pixels to see if users with smaller screen resolutions are scrolling down long webpages to click on content. this information is hard to glean from crazy egg’s site overlay report because it focuses on the individual link or graphic. the confetti view report focuses on clicks at the webpage level, allowing libraries to view usage trends on a webpage. crazy egg is a hosted service like google analytics, which means all the data are stored on crazy egg’s web servers and accessed through its website. implementing crazy egg on a webpage is a two-step process requiring the web manager to first set up the snapshot within the crazy egg profile and then add the tracking javascript tags to the webpage it will track. once the javascript tags are in place, crazy egg takes a picture of the current webpage and stores that as the snapshot on which to overlay the click data reports. since it uses a “snapshot” of the webpage, the website manager needs to retake a snapshot of the webpage if there are any changes to it. retaking the snapshot requires only a click of a button to automatically stop the old snapshot and regenerate a new one based on the current webpage without having to change the javascript tags. figure 6. screenshot of crazy egg’s confetti view report librarians and technology skill acquisition: issues and perspectives | farney 147click analytics: visualizing website use data | farney 147 website. next, she will explore ways to automate the process of sharing of website use data to make this information more accessible to other interested librarians. by sharing this information, the web services librarian hopes to promote informed decision making for the library’s web content and design. references 1. avinash kaushik, web analytics 2.0: the art of online accountability and science of customer centricity (indianapolis: wiley, 2010): 81–83. 2. laura b. cohen, “a two-tiered model for analyzing library website usage statistics, part 2: log file analysis,” portal: libraries & the academy 3, no. 3 (2003): 523–24. 3. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33, no. 4 (2005): 377–78. 4. marshall breeding, “an analytical approach to assessing the effectiveness of web-based resources,” computers in libraries, 28, no. 1 (2008): 20–22. 5. julie arendt and cassie wagner, “beyond description: converting web site statistics into concrete site improvement ideas,” journal of web librarianship 4, no. 1 (january 2010): 37–54; steven j. turner, “websites statistics 2.0: using google analytics to measure library website effectiveness,” technical services quarterly 27, no. 3 (2010): 261–278; wei fang and marjorie e. crawford, “measuring law library catalog web site usability: a web analytic approach,” journal of web librarianship 2, no. 2–3 (2008): 287–306. 6. ardent and wagner, “beyond description,” 51–52; andrea wiggins, “data-driven design: using web analytics to validate heuristics,” bulletin of the american society for information science and technology 33, no. 5 (2007): 20–21; elizabeth l. black, “web analytics: a picture of the academic library web site user,” journal of web librarianship 3, no. 1 (2009): 12–13. 7. trevor claiborne, “introducing in-page analytics: visual context for your analytics data,” google analytics blog, oct. 15, 2010, http://analytics.blogspot .com/2010/10/introducing-in-page-ana tracking abilities, however, all provide a distinct picture of how visitors use a webpage. by using all of them, the web services librarian was able to clearly identify and recommend the links for removal. in addition, she identified other potential usability concerns, such as visitors clicking on nonlinked graphics rather than the link itself. a major bonus of using click analytics tools is their ability to create easy to understand reports that instantly display where visitors are clicking on a webpage. no previous knowledge of web analytics is required to understand these reports. the web services librarian found it simple to present and discuss click analytics reports with other librarians with little to no background in web analytics. this helped increase the transparency of why links were targeted for removal from the homepage. as useful as click analytics tools are, they cannot determine why users click on a link, only where they have clicked. click analytics tools simply visualize website usage statistics. as elizabeth black reports, these “statistics are a trail left by the user, but they do not explain the motivations behind the behavior.”20 she concludes that additional usability studies are required to better understand users and their interactions on a website.21 libraries can use the click analytics reports to identify a problem on a webpage, but further usability testing will explain why there is a problem and help library web managers fix the issue and prevent repeating the mistake in the future. the web services librarian incorporated the use of in-page analytics, clickheat, and crazy egg in her web analytics practices since these tools continue to be useful to test the usage of new content added to a webpage. furthermore, she finds that click analytics’ straightforward reports prompted her to share website use data more often with fellow librarians to assist in other decisionmaking processes for the library’s down too much to get to needed links. by comparing the old homepage and the new homepage confetti reports in crazy egg, it was instantly apparent that the new homepage had significantly fewer clicks on its bottom half than the old version. furthermore, comparing the different versions using the time to click segment in the site overlay showed that placing the link more prominently on the webpage decreased the overall time it took users to click on it. crazy egg’s main drawback is that archived pages that are no longer tracking click data count toward the overall number of snapshots that can be tracked at one time. if libraries regularly retest a webpage, they will easily reach the maximum number of snapshots their subscription permits in a relatively short period. once a crazy egg subscription is cancelled data stored in the account is no longer accessible. this increases the importance of regularly exporting data. crazy egg is designed to export the heat map and confetti view reports. the direct export function takes a snapshot of the current report as it is displayed, and automatically converts that image into a pdf. exporting the heat map is fairly simple because the report is a single image, but exporting all the content in the confetti view report is more difficult because the report is based on segments of click data. each segment type would have to be exported in a separate pdf report to retain all of the content. in addition, there is no export option for the site overlay report so there is not an easy method to manage that information outside of crazy egg. even if libraries are actively exporting reports from crazy egg, data loss is inevitable. summary and conclusions closely examining in-page analytics, clickheat, and crazyegg reveals that each tool has different levels of click 148 information technology and libraries | september 2011 (2009): 81–84. 17. clickheat performance and optimization, labsmedia, http://www .labsmedia.com/clickheat/156894.html (accessed feb. 7, 2011). 18. clickheat, sourceforge, http:// sourceforge.net/projects/clickheat/files/ (accessed feb. 7, 2011). 19. crazy egg, http://www.crazyegg .com/, (accessed on mar. 25, 2011). 20. black, “web analytics,” 12. 21. ibid., 12–13. 13. screengrab, firefox add-ons, https://addons.mozilla.org/en-us/fire fox/addon/1146/ (accessed feb. 7, 2011). 14. clickheat, labsmedia, http:// www.labsmedia.com/clickheat/index .html (accessed feb. 7,2011). 15. piwik, http://piwik.org/ (accessed feb. 7, 2011). 16. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship, 21, no. 1 lytics-visual.html (accessed feb. 7, 2011). 8. kaushik, web analytics 2.0, 88. 9. turner, “websites statistics 2.0,” 272–73. 10. kaushik, web analytics 2.0, 53–55. 11. site overlay not displaying outbound links, google analytics help forum, http://www.google.com/ support/forum/p/google+analytics/ thread?tid=39dc323262740612&hl=en (accessed feb. 7, 2011). 12. claiborne, “introducing in-page analytics.” letter from the editors (march 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14881 our first issue of 2022 brings the welcome appointment of marisha c. kelly as assistant editor for the journal. marisha is reference and instruction librarian at northcentral university, wh ere her job duties include planning, developing, integrating, implementing, and maintaining digital systems and services. she has a bachelor of science in journalism from syracuse university, a master of science in library and information science from drexel university, and is currently pursuing a master of science in information technology from northcentral university. contribute to the journal are you interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums? i would assume the answer is “yes” if you are reading this issue. ital needs new editorial board members to fill vacancies starting in july. joining the board is an exciting way for members of core to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries, archives, and museums. we are especially interested in applications from those in underrepresented groups and identities and encourage all interested individuals to apply. please see the full call for nominations for more information and details on how to apply. we also encourage all library technologists to consider submitting articles for publication. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. in this issue in the final thought-provoking editorial board thoughts column (“policy before technology— don’t outkick the coverage”) of his editorial board term, brady lund writes about the risks of adopting new technologies before thinking through the possible policy and practical implications of offering it. we likewise highly recommend the peer-reviewed content in this issue: 1. using dpla and the wikimedia foundation to increase usage of digitized resources / dominic byrd-mcdevitt and john dewees 2. researchgate metrics’ behavior and its correlation with rg score and scopus indicators / saeideh valizadeh-haghi, hamed nasibi-sis, maryam shekofteh, and shahabedin rahmatizadeh 3. balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository / daniel coughlin 4. using open access institutional repositories to save the student symposium during the covid-19 pandemic / allison symulevich and mark hamilton 5. migration of ict-based services of a research library to a cloud platform / francis jayakanth, ananda t. byrappa, and filbert minj 6. local hosting of faculty-created open education resources / joseph letriz kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu mkelly@ncu.edu https://www.ala.org/news/member-news/2022/01/marisha-c-kelly-selected-new-ital-assistant-editor https://drive.google.com/file/d/1-foy8y5hyhr8op9wmouvfc3yz3ctykeo/view?usp=sharing https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/13659 https://ejournals.bc.edu/index.php/ital/article/view/14033 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/13537 https://ejournals.bc.edu/index.php/ital/article/view/13803 mailto:varnum@umich.edu mailto:mkelly@ncu.edu contribute to the journal in this issue searchable signatures: context and the struggle for recognition gina schlesselmantarango information technology and libraries | september 2013 5 abstract social networking sites made possible through web 2.0 allow for unique user-generated tags called “searchable signatures.” these tags move beyond the descriptive and act as means for users to assert online individual and group identities. this paper presents a study of searchable signatures on the instagram application, demonstrating that these types of tags are valuable not only because they allow for both individuals and groups to engage in what social theorist axel honneth calls the “struggle for recognition,” but also because they provide contextual use data and sociohistorical information so important to the understanding of digital objects. methods for the gathering and display of searchable signatures in digital library environments are also explored. introduction a comparison of user-generated tags with metadata traditionally assigned to digital objects suggests that social network platforms provide an intersubjective space for what social theorist axel honneth has termed the “struggle for recognition.” 1 social network users, through the creation of identity-based tags—or what can be understood as “searchable signatures”—are able to assert and perform online selves and are thus able to demand, or struggle for, recognition within a larger social framework. baroncelli and freitas cogently argue that web 2.0, or the interactive online social arena, in fact functions as a “recognition market in which contemporary individuals . . . trade personal worth through displays and exchanges of . . . self-presentations.” 2 a comparison of a metadata schema used in yale university’s digital images database with usergenerated tags accompanying shared photographs on the social networking platform instagram demonstrates that searchable signatures are unique to social networking sites. as phenomena that allow for public presentations of disembodied selves, searchable signatures thus provide specific information about the context of the digital images with which they are associated. capturing context remains a challenge for those working with digital collections, but searchable signatures allow viewers to derive valuable use data and sociohistorical information to better understand the world in which digital images originated and exist. literature review web 2.0 identities and recognition theory while web 2.0 can be imagined as a highly collaborative space where social actors are able to gina schlesselman-tarango (gina.schlesselman@du.edu) holds a master of social sciences from the university of colorado denver and is currently an mlis candidate at university of colorado. mailto:gina.schlesselman@du.edu searchable signatures: context and the struggle for recognition | schlesselman-tarango 6 communicate to the world new identities, some warn that this communication is somehow engineered and performed. van dijck, in an analysis of social media, argues that it is indeed “publicity strategies [that] mediate the norms for sociality and connectivity,” and baroncelli and freitas note that web 2.0 allows people to make themselves visible through modes of spectacularization.3 though his focus is on the spectacle in fin de siècle france, clark provides some insight into the effects of spectacularization on the individual. 4 working within a historical materialist framework, clark points that with the growth of capitalism, the individual has become colonized. 5 clark further describes this colonization as “massive internal extension of the capitalist market—the invasion and restructuring of whole areas of free time, private life, leisure, and personal expression . . . the making-into-commodities of whole areas of social practice which had once been referred to casually as everyday life.” 6 here, web 2.0 is not a liberatory tool but instead a space where users are colonized to the extent that they create selves exchanged through social networking sites owned by capitalist enterprises. web 2.0, then, has created a situation in which personal time and identification can be successfully commodified. baroncelli and freitas conclude, “from that formula, personal life becomes a capital to be shared with other people—preferably, with a large audience.” 7 the problem, then, is that one’s existence is defined simply “by being seen by others” and can no longer be understood as authentic.8 despite the sophistication of the argument detailed above, there are some who view the online self, created through web 2.0, as a legitimate and authentic identity. in an account of the online self, hongladarom summarizes this position, noting that both offline and virtual identities are constructed in social environments. 9 for hongladarom, these identities are not different in essence because “what it is to be a person . . . is constituted by external factors.” 10 the online world as an external factor has the ability to affirm one’s existence, regardless of whether that existence is physical or virtual. in sum, it is the social other and not a material existence that is the authenticating factor in identity formation. there are others who validate the role that spectacle—or what also can be understood as performance—plays in identity formation. pearson calls on the work of goffman to argue, “identity-as-performance is seen as part of the flow of social interaction as individuals construct identity performances fitting their milieu.” 11 for pearson, the identity is always performed, be it through web 2.0 or otherwise. there is nothing particularly worrisome, then, about the effects of web 2.0 on the self, nor does web 2.0 threaten the authenticity of the self. identity is always performed and is in a sense a spectacle—this does not mean, however, that identity in itself is spurious. it is with this perspective of the online self as a performed albeit authentic identity that this paper further develops. before a thorough analysis of the searchable signature as an online self can be conducted, a deeper understanding of honneth’s theory of recognition is in order. information technology and libraries | september 2013 7 in his 1995 work the struggle for recognition: the moral grammar of social conflicts, honneth sets out to develop a social theory based on what he calls “morally motivated struggle.” 12 based on the habermasian concept of communicative action, honneth contends that it is through mutual recognition that “one can develop a practical relation-to-self [and can] view oneself from the normative perspective of one’s partners in interaction, as their social addressee.” 13 relation-toself is key for honneth, and he argues that a healthy relation-to-self, or what can be thought of as self-esteem, is developed when one is seen as valuable by others. beyond self-esteem, honneth points that the success of social life itself depends on “symmetrical esteem between individualized (and autonomous) subjects.” 14 for honneth, this “symmetrical esteem” can lead to solidarity between individuals. “relationships of this sort,” he explains, “can be said to be cases of ‘solidarity’ because they inspire not just passive tolerance but felt concern for what is individual and particular about the other person.” 15 that is to say that felt concern for another allows one to see the specific traits of the other as valuable in working towards common goals, and honneth imagines that in situations of “symmetrical esteem . . . every subject is free from being collectively denigrated, so that one is given the chance to experience oneself to be recognized, in light of one’s own accomplishments and abilities, as valuable for society.” 16 until this ideal is realized, however, individuals must find sites in which to struggle to be recognized as valuable social assets. according to baroncelli and freitas, it is in fact web 2.0 that provides the arena where “the contemporary demand for the visibility of the self” is able to flourish. 17 they position this argument within honneth’s framework, asserting that the visibility of self is “directed towards a quest for recognition,” and they thus conclude that web 2.0 can be understood as a “recognition market.” 18 context and its importance capturing and integrating markers of context into records, according to chowdhury, still present a challenge for many.19 “there is now a general consensus that the major challenge facing a digital library as well as a digital preservation program is that it must describe its content as well as the context sufficiently well to allow its correct interpretation by the current and future generations of users,” he contends.20 context in itself is difficult to define, let alone its myriad facets that might or might not facilitate better understanding of digital objects. dervin, in her exploration of the meaning of context, points that it is often conceptualized as the “container in which the phenomenon resides.” 21 she points that the list of factors that constitute the container and might be considered contextual is in fact “inexhaustible”—items on this list, for example, might include the gender, race, and ethnicity of those involved in a phenomenon. 22 in an indexing or digital collection environment, the goal is to determine which of these many factors ought be included in a record to best allow for discovery and use. searchable signatures: context and the struggle for recognition | schlesselman-tarango 8 others imagine context as a fluid, ever-changing process rather than as a static container of data. “in this framework,” dervin writes, “reality is in a continuous and always incomplete process of becoming.” 23 this understanding of context as changing is helpful for those working with objects that live in digital environments, especially web 2.0. certainly the interactive nature of the web has created room for a variety of users to create, share, appropriate, comment on, tag, reject, celebrate, and ultimately understand images in a multitude of contexts that might be different from one moment to the next. there are many reasons to include contextual information in records of digital objects. lee argues that by providing context, or what he describes as the “social and documentary” world “in which [a digital object] is embedded,” future users will be able to better understand the “details of our current lives.” 24 further, lee contends that context is helpful in that is illustrates the ways in which a digital object is related to other materials: relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. in order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of . . . representations—e.g. titles, tags, captions, annotations, image thumbnails, video keyframes—were associated with a digital object at a given point in time. 25 the user-generated tag, then, is a valuable representation that provides contextual information surrounding the perception and experience of the image with which it is directly related. discussion user-generated tags and traditional metadata user-generated tags have been hailed as an important stage in the evolution of image description and are said to have the potential to shape controlled vocabularies used in traditional metadata schemas. for example, in a comparison of flickr tags and index terms from the university of st. andrews library photographic archive, rorissa stresses the importance of exploring similarities and differences between indexers’ and users’ language, noting that “social tagging could serve as a platform on which to build future indexing systems.” 26 like others, rorissa hopes that continued research into user-generated social tags will be able to “bridge the semantic gap between indexerassigned terms and users’ search language.” 27 in fact, some are currently utilizing social tags in an effort to describe and facilitate access to collections. one such organization is steve: the museum social tagging project, “a place where you can help museums describe their collections by applying keywords, or tags, to objects.” 28 the organization allows users to not only view traditional metadata associated with cultural objects, but also tags generated by others. in an effort to better understand the similarities and differences between user-generated tags and the language used in traditional metadata schemas, one must compare the two systems. information technology and libraries | september 2013 9 yale university’s digital images database provides a glimpse at the ways in which traditional metadata schemas are typically used to describe images in digital library settings. most of the images included in the database are accompanied by descriptive, structural, and administrative metadata. for example, an item entitled “boy sitting on a stoop holding a pole” (see figure 1) from the university’s collection of 1957–90 andrews st. george papers provides a digital copy of the image, the image number, name of the creator, date of creation, type of original material, dimensions, copyright information, manuscript group name and number, box and folder numbers, and a credit line.29 the image is further described by the following: “man in the shed is making homemade bombs. the boy and man are also in image 45350.” 30 figure 1. “boy sitting on a stoop holding a pole” from yale university’s digital images database collection of 1957–90 andrews st. george papers, november 2012. certainly, such information is useful in library environments and provides users with helpful and formatted data to best guide the information discovery process. the finding aid for the andrews st. george collection is additionally helpful in that it includes information about provenance, access, processing, associated materials, and the creator; it also contains descriptive information about the collection by box and folder number. 31 however, if additional use data and sociohistorical searchable signatures: context and the struggle for recognition | schlesselman-tarango 10 information specific to this individual item were available, it would be most helpful in assisting users in determining the image’s greater context. a study of modes of participation on social networking sites suggests that it is now possible to supply such contextual information for digital objects that live in interactive online environments. a useful site for exploring user-generated tags associated with images is instagram, a social application designed for iphone and android.32 instagram users are able to upload and edit photos, and other users can then view, like, and comment on the shared photos. instagram users are able to follow other users and search for photos by the creator’s username or by accompanying tags. instagram, owned by facebook, is interoperable with other social networking sites, and users have the ability to share their photos on facebook, flickr, tumblr, and twitter. as of july 2012, it was reported that instagram had 80 million users, and in september 2012, the new york times reported that 5 billion photos were shared through the application.33 users are limited to 30 tags per photo, and instagram suggests that users be as specific as possible when describing an image with a tag so that communities of users with similar interests can form.34 many tags, like the information included in traditional metadata schemas, aim to best describe an image by explaining its content; for example, one user assigned the tags #kids, #nieces, #nephews, and #family to a photograph of a group of smiling children (see figure 2). like the information accompanying the photograph in the yale university digital images database, such tags provide users and viewers with tools to better determine the “aboutness” of the image at hand. information technology and libraries | september 2013 11 figure 2. photo shared on instagram assigning both descriptive tags and the searchable signature #proudaunt, november 2012. however, instagram users are repurposing the tagging function in a way that is unique to social networking sites. in addition to the descriptive tags assigned to the image of the children described above, the user also tagged the photo with the term #proudaunt (see figure 2). there is, however, no aunt (what can be assumed to be an adult female) in the photograph. this tag, then, functions to further identify the user who created or shared the photograph and does not describe the content of the image at hand. a search of the same tag, #proudaunt, demonstrates that this user is not alone in identifying as such: in november 2012, this search returned 40,202 images with the same tag and more than 58,000 images with tags derived from the same phrase (#proudaunty, #proudauntie, #proudaunties, #proudauntiemoment, and #proudaunti) (see figure 3). figure 3. list of results from #proudaunt hashtag search on instagram, november 2012. this type of user-generated tag—one that identifies the creator or sharer of the photograph yet is not necessarily meant to describe the content of the image—can be understood as a searchable signature. such identity-based tags are not found within yale university’s digital images database; the closest relative of the searchable signature is the creator’s name. while searchable, this name is not alternative, or secondary, and it was not created and does not exist in a social environment. searchable signatures: context and the struggle for recognition | schlesselman-tarango 12 currently, born-digital objects are often created and shared in a technological milieu that allows for the assignment of user-generated tags. consequently, the integration of the searchable signature into the presentation of digital objects has become part of accepted social practice and offers unique opportunities for digital library curators and users alike. until quite recently, most materials—be they photographs, manuscripts, or government documents—were not born in digital environments. however, digitization projects have been undertaken to ensure that such historical materials are more widely and eternally available. these reborn digital objects, then, have been and can be integrated into dynamic social environments. steve: the museum social tagging project, mentioned earlier in this paper, is one example of an organization that has capitalized on the social practice of user-generated tagging and is using descriptive tags along with traditional metadata to better describe reborn digital objects. it is important, then, to explore what (if any) implications the application of the searchable signature, a unique type of user-generated tag, has for historical objects that are later integrated into digital environments. searchable signatures associated with born digital images on social networking sites contain valuable information about their creators, users, and the images’ context. one cannot ignore that users will, if given the chance, also likely apply signatures to reborn digital objects in similar ways that they do to objects that have always existed in social environments. since the searchable signature is used to identify not only digital image creators, but also sharers, and if these signatures do in fact provide important insight into the sharers and their motivations, then these signatures are not to be ignored. rather than focusing on the creating, the lens through which to understand the searchable signature for reborn digital objects can be shifted to the social act of sharing: by whom, when, in which social environments, and for what purposes. a deeper analysis of the presentation of self through the searchable signature and the role that the signature plays in providing valuable contextual information for both bornand reborn-digital objects is developed below. searchable signatures and the struggle for recognition if web 2.0 indeed functions as a recognition market, then social media and social networking sites might appear to be tables at such a market. placing oneself behind a table—be it facebook, twitter, or instagram—the user is able to perform his or her online identity to passersby and effectively struggle to be recognized as a unique individual or as a member of a social group. these performances, which could be deemed narcissistic in nature, can alternatively be read as healthy attempts to self-actualize and connect to larger society.35 one such “table” in the recognition market is instagram. beyond instagram’s social nature that allows participants to interact with and follow one another, the specific role of the searchable signature is of interest to those who are concerned with struggles for recognition. rather than describing shared images, searchable signatures reflect performative yet authentic user identities. information technology and libraries | september 2013 13 mccune, in a case study of consumer production on instagram, acknowledges the potential of the tag to not only facilitate image exchange but to communicate users’ positions as members of social groups.36 through a simple search of tags, users who identify as, for example, “cat ladies,” are able to validate their identities when they see that there are many others who use the same or similar language in demonstrations of the self (see figure 4). other signatures such as #proudaunt, while not necessarily playful, still function to provide viewers with additional information about the instagram user that cannot be determined through the photo itself. the ability to find images based on these searchable signatures allows users to find others who identify in a like manner and to imagine themselves as part of a larger social group. in effect, searchable signatures allow users to be recognized as social addressees of like-minded others. positioning oneself within a group must be understood as a struggle for recognition, for to imagine oneself as part of the social fabric is also to see oneself as valuable. figure 4. list of results from #catlady hashtag search on instagram, november 2012. enabled by web 2.0, searchable signatures contain potential for marginalized peoples or groups to assert online selves to be seen and ultimately heard in a truly intersubjective landscape. it is not too much of a leap to imagine that searchable signatures might make possible the organization of individuals and groups for political purposes. in fact, in a discussion of social groups, honneth notes that “the more successful social movements are at drawing the public sphere’s attention to searchable signatures: context and the struggle for recognition | schlesselman-tarango 14 the neglected significance of the traits and abilities they collectively represent, the better their chances of raising the social worth, or indeed, the standing of their members.” 37 here, searchable signatures might provide such movements with a venue to capture the public’s attention and to effectively struggle for and gain recognition. searchable signatures and context as markers of individual and group identities, searchable signatures are unique in that they provide a snapshot of the multitude of social, historical, political, individual, and interpersonal relationships that ontologize the images with which they are paired. it is this very contextual information that is at times lacking in traditional indexing environments. by examining searchable signatures, experts and users are able to understand which individuals and groups create, use, and identify with certain images. thus, as markers of self, searchable signatures provide use data for scholars to better investigate which images are important to online individual or group identities. if the searchable signature is used in a political fashion, historians and sociologists might be able to study which types of images, for example, marginalized groups rally around, identify with, and use in their struggles for recognition. such use data also illuminates how and by whom certain digital images have been appropriated over time. for example, if a picture of a cat is first created or shared via instagram by an animal rights activist, the image might be accompanied by the searchable signature #humanforcats. this same image, shared by another user months later, might be accompanied by the #catlady signature. those interested will be able to examine how the same image has been historically used for different purposes and will be better able to grasp the evolving nature of its digital context. in addition to use data, the searchable signature provides insight into the sociohistorical context surrounding digital images. for those who perceive “reality . . . as accessible only (and always incompletely) in context, in specific historicized moments in time space” the searchable signature clarifies and makes more accessible that reality surrounding the digital image. 38 in a traditional library setting, a photo of a cat might be indexed with descriptive subject headings such as “cat,” “persian cat,” or “kitten—behavior.” however, the searchable signature #catladyforlife provides additional information on how the cat has become, for a certain social group in a specific moment in time, a trope of sorts for those who are proud of not only their relationships with their domestic pets, but of their shared values and lifestyles as well. if a historian were to dig deeper, he or she also might see that “cat lady” has historically been used in a derogatory manner to mark single, unattractive women thought to be crazy and unable to care for the great number of cats they own and that, by (re)claiming this title, women might be engaging in a struggle for recognition that extends beyond mere admiration for felines.39 chowdhury, in a continued discussion of challenges facing the digital world, asks whether it is “possible to capture the changing context along with the content of each information resource, because as we know the use and importance . . . changes significantly with time.” 40 additionally, he information technology and libraries | september 2013 15 asks, “will it be possible to re-interpret the stored digital content in the light of the changing context and user community, and thereby re-inventing the importance and use of the stored objects?” 41 it is here that the searchable signature offers use data and sociohistorical information to illuminate the (changing) value digital images have for individuals, communities, and society. conclusion clark argues that representation must be understood within the confines of what he calls “social practice.” 42 social practice, among other things, can be understood as “the overlap and interference of representations; it is their rearrangement in use.” 43 representation of self also must be understood within current social practice, and an important facet of today’s practice is web 2.0. as a social space, web 2.0 allows for the creation of disembodied self-representations. one type of such representation, the searchable signature, is a phenomenon unique to social networking sites. while many acknowledge the potential of descriptive, user-generated tags to inform or even to be used in conjunction with metadata schemas or controlled vocabularies, instagram users have created an additional, alternative use for the tag. rather than simply using tags to describe shared images, they have successfully created a route to online identity formation and recognition. searchable signatures demonstrate the power of the online self, as they allow users to struggle to be recognized as unique individuals or as parts of larger social groups. these signatures, too, might act as platforms on which social groups can assert their value and thus demand recognition. additionally, searchable signatures provide contextual information that reflects the social practice in which digital images live. while the capture and integration of such information remains a challenge for those engaged in traditional indexing, web 2.0 allows for this unique type of usergenerated tag and thus provides better understanding of the context surrounding digital images. as to the question of whether searchable signatures can be integrated into existing metadata schemas or be used to inform controlled vocabularies in library environments, it is not unreasonable to suggest that digital objects be accompanied by their supplemental yet valuable representations (e.g., searchable signatures and the like). many methods exist through which these signatures might be both gathered and displayed. certainly, a full exploration of such practices is the stuff of future research; however, some initial ideas are detailed below. one method of gathering identity-based tags would involve the active hunting down of searchable signatures. locating objects on social networking sites that are also in one’s digital collection, the indexer would identify and track associated user-generated searchable signatures. this method would require extreme diligence, high levels of comfort navigating and using web 2.0, a clear idea of which social networking sites yield the most valuable searchable signatures, and likely one or more full-time staff members devoted to such activities. even if feed systems were employed for individual digital objects, this method demands much of indexers and would likely not be sustainable over time. searchable signatures: context and the struggle for recognition | schlesselman-tarango 16 a more passive yet efficient way of gathering searchable signatures would simply be to build on methods that have shown to be successful. by creating interactive digital environments that encourage users to assign not only descriptive but also identity-based tags, indexers are freed of the time-consuming task of hunting for searchable signatures on the web. since searchable signatures have come to be part of online social practice, the assigning of them would likely be familiar to users—initially, libraries might need to prompt users to share signatures or provide them with examples. this gathering tactic could be used to harvest signatures for items that are already part of the library’s digital collection (telling us about signatures used by potential sharers) or as a means to incorporate new digital objects into the collection (telling us about signatures used by both creators and sharers). in both gathering scenarios, indexers might choose to display only the most occurring or what they deem to be the most relevant searchable signatures, or they might choose to display all such tags; decisions such as these will ultimately depend on each institution’s mission and resources. of course, if a library integrates a born-digital image into its collection and can identify the searchable signatures originally assigned to it via social networking sites or otherwise, this information should also be recorded. here, users will be able to get a glimpse of the image in its pre-library life. providing associated usernames, dates posted, and the name of the social networking sites too will assist in providing a more complete picture of the individuals or groups linked to the image. this information can provide valuable data about the information creators and sharers who use specific social platforms. the aim of this paper is to lay the theoretical groundwork to better understand the role of searchable signatures in today’s digital environment as well as the signature’s unique ability to provide context for digital images. surely, further research into the phenomenon of the searchable signature would demonstrate how it is currently used outside of instagram or as a political tool. others might consider examining the username as another arena in which individuals or groups construct and perform online identities and thus engage in struggles for recognition. usernames also might provide contextual use data and sociohistorical information that inevitably support greater understanding of digital objects. finally, further research is needed to identify how libraries could utilize the searchable signature in promotional activities and to build and cater to user communities. references 1. axel honneth, the struggle for recognition: the moral grammar of social conflicts (cambridge: mit press, 1995). 2. lauane baroncelli and andre freitas, “the visibility of the self on the web: a struggle for recognition,” in proceedings of 3rd acm international conference on web science, 2011, accessed august 12, 2013, www.websci11.org/fileadmin/websci/posters/191_paper.pdf. information technology and libraries | september 2013 17 3. jose van dijck, “facebook as a tool for producing sociality and connectivity,” television & new media 13, no. 2 (2012): 160–76; baroncelli and freitas, “the visibility of the self.” 4. t. j. clark, introduction to the painting of modern life: paris in the art of manet and his followers (princeton, nj: princeton university press, 1984), 1–22. 5. ibid. 6. ibid., 9. 7. baroncelli and freitas, “the visibility of the self.” 8. ibid. 9. soraj hongladarom, “personal identity and the self in the online and offline world,” minds & machines 21 (2011): 533–48. 10. ibid., 541. 11. erika pearson, “all the world wide web’s a stage: the performance of identity in online social networks,” first monday 14 (2009), accessed november 9, 2012, www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm; erving goffman, the presentation of self in everyday life (garden city, ny: doubleday, 1959). 12. honneth, the struggle for recognition, 1. 13. jurgen habermas, the theory of communicative action (boston: beacon, 1984); honneth, the struggle for recognition, 92. 14. honneth, the struggle for recognition, 129. 15. ibid. 16. ibid., 130. 17. baroncelli and freitas, “the visibility of the self.” 18. ibid. 19. gobinda chowdhury, “from digital libraries to digital preservation research: the importance of users and context,” journal of documentation 66, no. 2 (2010): 207–23, doi: 10.1108/00220411011023625. 20. ibid., 217. 21. brenda dervin, “given a context by any other name: methodological tools for taming the unruly beast,” in information seeking in context, ed. pertti vakkari et al. (london: taylor graham, 1997), 13–38. searchable signatures: context and the struggle for recognition | schlesselman-tarango 18 22. ibid., 15. 23. ibid., 18. 24. christopher a. (cal) lee, “a framework for contextual information in digital collections,” journal of documentation 67 (2011): 95–143. 25. ibid., 100. 26. abebe rorissa, “a comparative study of flickr tags and index terms in a general image collection,” journal of the american society for information science and technology 61, no. 11 (2010): 2230–42. 27. ibid., 2239. 28. “steve central: social tagging for cultural collections,” steve: the museum social tagging project, accessed december 16, 2012, http://tagger.steve.museum. 29. “yale university library manuscripts & archives department,” yale university manuscripts & archives digital images database, last modified april 19, 2012, accessed december 3, 2012, http://images.library.yale.edu/madid. 30. ibid. 31. “andrew st. george papers (ms 1912),” manuscripts and archives, yale university library, accessed april 30, 2013, http://drs.library.yale.edu:8083/fedoragsearch/rest. 32. “faq,” instagram, accessed november 10, 2012, http://instagram.com/about/faq. 33. emil protalinksi, “instagram passes 80 million users,” cnet, july 6, 2012, accessed november 13, 2012, http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-millionusers; jenna wortham, “it’s official: facebook closes its acquisition of instagram,” new york times, september 6, 2012, accessed november 13, 2012, http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-ofinstagram. 34. “tagging your photos using #hashtags,” instagram, accessed november 10, 2012, http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-usinghashtags; “instagram tips: using hashtags,” instagram, accessed november 10, 2012, http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags. 35. andrew l. mendelson and zizi papacharissi, “look at us: collective narcissism in college student facebook photo galleries,” in a networked self: identity, community and culture on social network sites, ed. zizi papacharissi (new york: routledge, 2010), 251–73. 36. zachary mccune, “consumer production in social media networks: a case study of the http://tagger.steve.museum/ http://images.library.yale.edu/madid/ http://drs.library.yale.edu:8083/fedoragsearch/rest/ http://instagram.com/about/faq/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags information technology and libraries | september 2013 19 ‘instagram’ iphone app” (master’s dissertation, university of cambridge, 2011), accessed december 20, 2012, http://thames2thayer.com/portfolio/a-study-of-instagram. 37. honneth, the struggle for recognition, 127. 38. dervin, “given a context by any other name,” 17. 39. kiri blakeley, “crazy cat ladies,” forbes, october 15, 2009, accessed december 4, 2012, www.forbes.com/2009/10/14/crazy-cat-lady-pets-stereotype-forbes-woman-timefelines.html; crazy cat ladies society & gentlemen's auxiliary homepage, accessed december 4, 2012, www.crazycatladies.org. 40. chowdhury, “from digital libraries to digital preservation,” 219. 41. ibid. 42. clark, introduction to the painting of modern life, 6. 43. ibid. acknowledgments many thanks to erin meyer and dr. krystyna matusiak at the university of denver for their feedback and guidance. http://thames2thayer.com/portfolio/a-study-of-instagram/ 214 information technology and libraries | december 2010 margaret brown-sica, jeffrey beall, and nina mchale next-generation library catalogs and the problem of slow response time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. this study also looks at benchmarks in response time and defines what is unacceptable and why. when advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, nextgen catalogs represent a step backward, not forward. in august 2009, the auraria library launched an instance of the worldcat local product from oclc, dubbed worldcat@auraria. the library’s traditional catalog—named skyline and running on the innovative interfaces platform—still runs concurrently with worldcat@auraria. because worldcat local currently lacks a library circulation module that the library was able to use, the legacy catalog is still required for its circulation functionality. in addition, skyline contains marc records from the serialssolution 360 marc product. since many of these records are not yet available in the oclc worldcat database, these records are being maintained in the legacy catalog to enable access to the library’s extensive collection of online journals. almost immediately upon implementation of worldcat local, many library staff began to express concern about the product’s slow response time. they bemoaned its slowness both at the reference desk and during library instruction sessions. few of the discussions of the product’s slow response time evaluated this weakness in the context of its advanced features. several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. indeed, many stated that they would only use the legacy skyline catalog from then on. therefore we decided to analyze the product’s response time in relation to the legacy catalog. we also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ response time the term response time can mean different things in different contexts. here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the internet from a web server to the computer on which the page is to be displayed. we do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. typically, a single webpage is made of multiple files; these are sent via the internet from a web response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the internet from a web server to the end user’s browser. in this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (nextgen) catalog. the authors also discuss acceptable response time and how it may affect the discovery process. they suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes. n ext-generation, or nextgen, library catalogs offer advanced features and functionality that facilitate library research and enable web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. in addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. this additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also contain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. this additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. however, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the internet and reach the end user. moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. during a reference interview or library instruction session, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open internet search. the two-fold purpose of this study is to define the concept of response time as it relates to both traditional and nextgen library catalogs and to measure some typical response times in a selection of library catalogs. libraries margaret brown-sica (margaret.brown-sica@ucdenver.edu) is assistant professor, associate director of technology strategy and learning spaces, jeffrey beall (jeffrey.beall@ucdenver.edu) is assistant professor, metadata librarian, and nina mchale (nina.mchale@ucdenver.edu) is assistant professor, web librarian, university of colorado denver. next-generation library catalogs | brown-sica, beall, and mchale 215 mathews posted an article called “5 next gen library catalogs and 5 students: their initial impressions.”7 here he shares student impressions of several nextgen catalogs. regarding slow response time mathews notes, “lots of comments on slowness. one student said it took more than ten seconds to provide results. some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” nagy and garrison, on lauren’s library blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 any search interface is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. in fact, many users will not even know—or care—what is happening behind the scenes in a nextgen catalog. the assertion that slow response time makes wellintentioned improvements to an interface irrelevant is supported by an article that analyzes the development of semantic web browsers. frachtenberg notes that users, however, have grown to expect web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. it is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 this is not just a library issue. users expect a fast response to all web queries, and we can learn from studies on general web response time and how it affects the user experience. huang and fong-ling help explain different user standards when using websites. their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedonistic” sites.10 in other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ method for testing response time in an assortment of library catalogs, we used the websitepulse service (http://www .websitepulse.com). websitepulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and web server and website performance issues to clients. a thirty-day free trial is available for potential customers to review the full array of their services; however, the free web page test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. while the world wide web consortium (w3c) does not set forth any particular guidelines regarding response time, go-to usability expert jakob nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 he further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. for longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.3 even though this advice dates to 1994, nielsen noted even then that it had “been about the same for many years.”4 ■■ previous studies the chief benefit of studying response time is to establish it as a criterion for evaluating online products that libraries license and purchase, including nextgen online catalogs. establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. long response times will indicate that a product is deficient and suffers from poor usability. it is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “playing tag in the dark: diagnosing slowness in library response time,” brown-sica diagnosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 in that case, inadequate server specifications and settings were at fault. while there are many articles on nextgen catalogs, few of them discuss the issue of response time in relation to their success. search slowness has been reported in library literature about nextgen catalogs’ metasearch cousins, federated search products. in a 2006 review of federated search tools metalib and webfeat, chen noted that “a federated search could be dozens of times slower than google.”6 more comments about the negative effects of slow response time in nextgen catalogs can be found in popular library technology blogs. on his blog, 216 information technology and libraries | december 2010 ■■ findings: skyline versus worldcat@auraria in figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title hard lessons: the iraq reconstruction experience in skyline, auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. to use the webpage test, simply select “web page test” from the dropdown menu, input a url—in the case of the testing done for this study, the permalink for one of three books (see, for example, figure 1)—enter the validation code, and click “test it.” websitepulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s web browser. each line represents one of the files that make up the rendered webpage. they load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. longer segments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, waiting for a large image file or third-party content to load. accompanying the bar graph is a table describing the file transmissions in more detail, including dns, connection, file redirects (if applicable), first and last bytes, file transmission times, and file sizes. figure 1. permalink screen shot for the record for the title hard lessons in auraria library’s skyline catalog figure 2. websitepulse webpage test bar graph results for skyline (traditional) catalog record figure 3. websitepulse webpage test table results for skyline (traditional) catalog record next-generation library catalogs | brown-sica, beall, and mchale 217 requested at items 8, 14, 15, 17, 26, and 27. the third parties include yahoo! api services, the google api service, recaptcha, and addthis. recaptcha is used to provide security within worldcat local with optical character recognition images (“captchas”), and the addthis api is used to provide bookmarking functionality. at number 22, a connection is made to the auraria library web server to retrieve a logo image hosted on the web server. at number 28, the cover photo for hard lessons is retrieved from an oclc server. the files listed in figure 6 details the complex process of web browsers’ assembly of them. each connection to third-party content, while all relatively short, allows for additional features and functionality, but lengthens overall response. as figure 6 shows, the response time is slightly more than 10 seconds, which, according to nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 while widgets, third-party content, and other web 2.0 tools add desirable content and functionality to the library’s catalog, they also do slow response time considerably. the total file size for the bibliographic record in worldcat@auraria—compared to skyline’s 84.64 kb—is 633.09 kb. as will be shown in the test results below for the catalog and nextgen catalog products, bells and whistles added to traditional 1.1429 seconds total. the record is composed of a total of fourteen items, including image files (gifs), cascading style sheet (css) files, and javascript (js) files. as the graph is read downward, the longer segments of the bars reveal the sticking points. in the case of skyline, the nine image files, two css files, and one js file loaded quickly; the only cause for concern is the red line at item four. this revealed that we were not taking advantage of the option to add a favicon to our iii catalog. the web librarian provided the ils server technician with the same favicon image used for the library’s website, correcting this issue. the skyline catalog, judging by this data, falls into nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”11 further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times associated with the delivery of each file. the column on the right lists the size in kilobytes of each file. the total size of the combined files is 84.64 kb. in contrast to skyline’s meager 14 files, worldcat local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. figures 5 and 6 show that this includes 10 css files, 10 javascript files, and 8 images files (gifs and pngs). no item in particular slows down the overall process very much; the longestloading item is number 13, which is a wait for third-party content, a connection to yahoo!’s user interface (yui) api service. additional third-party content is being figure4. permalink screen shot for the record for the title hard lessons in worldcat@auraria figure 5. websitepulse webpage test bar graph results for worldcat@auraria record 218 information technology and libraries | december 2010 total time for each permalinked bibliographic record to load as reported by the websitepulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. we selected three books that were each held by all five of our test sites, verifying that we were searching the same three bibliographic records in each of the online catalogs by looking at the oclc number in the records. each of the catalogs we tested has a permalink feature; this is a stable url that always points to the same record in each catalog. using a permalink approximates conducting a known-item search for that item from a catalog search screen. we saved these links and used them in our searches. the bibliographic records we tested were for these books; the permalinks used for testing follow the books: book 1: hard lessons: the iraq reconstruction experience. washington, d.c.: special inspector general, iraq reconstruction, 2009 (oclc number 302189848). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ skyline: http://skyline.cudenver.edu/record=b243 3301~s0 ■■ lcoc: http://lccn.loc.gov/2009366172 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7195737~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{ckey} book 2: ehrenreich, barbara. nickel and dimed: on (not) getting by in america. 1st ed. new york: metropolitan, 2001 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ skyline: http://skyline.cudenver.edu/record=b187 0305~s0 ■■ lcoc: http://lccn.loc.gov/00052514 ■■ ut austin: http://catalog.lib.utexas.edu/record= b5133603~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{ckey} book 3: langley, lester d. simón bolívar: venezuelan rebel, american revolutionary. lanham: rowman & littlefield catalogs slowed response time considerably, even doubling it in one case. are they worth it? the response of auraria’s reference and instruction staff seems to indicate that they are not. ■■ gathering more data: selecting the books and catalogs to study to broaden our comparison and to increase our data collection, we also tested three other non-auraria catalogs. we designed our study to incorporate a number of variables. we decided to link to bibliographic records for three different books in the five different online catalogs tested. these included skyline and worldcat@auraria as well three additional online public access catalog products, for a total of two instances of innovative interfaces products, one of a voyager catalog, and one of a sirsidynix catalog. we also selected online catalogs in different parts of the country: worldcatlocal in ohio; skyline in denver; the library of congress’ online catalog (lcoc) in washington, d.c.; the university of texas at austin’s (ut austin) online catalog; and the university of southern california’s (usc) online catalog, named homer, in los angeles. we also did our testing at different times of the day. one book was tested in the morning, one at midday, and one in the afternoon. websitepulse performs its webpage tests from three different locations in seattle, munich, and brisbane; we selected seattle for all of our tests. we recorded the figure 6. websitepulse webpage test table results for worldcat@auraria record next-generation library catalogs | brown-sica, beall, and mchale 219 .org/oclc/256770509 ■■ skyline: http://skyline.cudenver.edu/record=b242 6349~s0 ■■ lcoc: http://lccn.loc.gov/2008041868 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7192968~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{ckey} we gathered the data for thirteen days in early november 2009, an active period in the middle of the semester. for each test, we recorded the response time total in seconds. the data is displayed in tables 1–3. we searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. the websitepulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. publishers, c2009 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat table 1. response times for book 1 response time in seconds day wor ldcat skyline lc ut austin usc 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 average 11.8310 1.3111 2.5105 3.4874 3.3953 table 2. response times for book 2 response time in seconds day worldcat skyline lc ut austin usc 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 average 12.1174 1.2579 1.9410 3.5791 4.5190 table 3. response times for book 3 response time in seconds day worldcat skyline lc ut austin usc 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 average 10.7717 1.3101 2.0076 3.4325 4.4112 table 4. averages response time in seconds book worldcat skyline lc ut austin usc book 1 11.8310 1.3111 2.5105 3.4874 3.3953 book 2 12.1174 1.2579 1.9410 3.5791 4.5190 book 3 10.7717 1.3101 2.0076 3.4325 4.4112 average 11.5734 1.2930 2.1530 3.4997 4.1085 220 information technology and libraries | december 2010 university of colorado denver: skyline (innovative interfaces) as previously mentioned, the traditional catalog at auraria library runs on an innovative interfaces integrated library system (ils). testing revealed a missing favicon image file that the web server tries to send each time (item 4 in figure 3). however, this did not negatively affect the response time. the catalog’s response time was good, with an average of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. as figure 1 shows, however, skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: online catalog (voyager) the average response time for the lcoc was 2.0076 ■■ results the data shows the response times for each of the three books in each of the five online catalogs over the thirteenday testing period. the raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). the averages show that during the testing period, the response time varied between 1.2930 seconds for the skyline library catalog in denver to 11.5734 seconds for worldcat@auraria, which has its servers in ohio. university of colorado denver: worldcat@auraria worldcat@auraria was routinely over nielsen’s ten second limit, sometimes taking as long as twenty seconds to load all the files to generate a single webpage. as previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. the files sent also include cover images, but they are small and do not add much to the total time. after our tests on worldcat@auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. this feature was one of the most file-intensive on a typical page, and its removal should speed up page loads. however, worldcat@auraria had the highest average response time by far of the five catalogs tested. figure 7. permalink screen shot for the record for the title hard lessons in the library of congress online catalog figure 8. websitepulse webpage test bar graph results for library of congress online catalog record figure 9. websitepulse webpage test table results for library of congress online catalog record next-generation library catalogs | brown-sica, beall, and mchale 221 item 14 is a script, that while hosted on the ils server, queries amazon.com to return cover image art (figures 11–12). the average response time for ut austin’s catalog was 3.4997 seconds. this example demonstrates that response times for traditional (i.e., not nextgen) catalogs can be slowed down by additional content as well. university of southern california: homer (sirsidynix) the average response time for usc’s homer catalog was 4.1085 seconds, making it the second slowest after seconds. this was the second fastest average among the five catalogs tested. while, like skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two css files and three gif files to load after the html content loads (figure 9). figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. total file size is 19.27 kb. as with skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at austin: library catalog (innovative interfaces) ut austin, like auraria library, runs an innovative interfaces ils. the library catalog also includes book cover images, one of the most attractive nextgen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). ut austin’s catalog uses a google javascript api (item 16 in figure 12) and librarything’s catalog enhancement product, which can add book recommendations, tag browsing, and alternate editions and translations. total content size for the bibliographic record is considerably larger than skyline and the lcoc at 138.84 kb. it appears as though inclusion of cover art nearly doubles the response time; figure 10. permalink screen shot for the record for the title hard lessons in university of texas at austin’s library catalog figure 11. websitepulse webpage test bar graph results for university of texas at austin’s library catalog record figure 12. websitepulse webpage test table results for university of texas at austin’s library catalog record 222 information technology and libraries | december 2010 completed. added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. based on the findings of this study, we recommend that libraries adopt web response time standards, such as those set forth by nielsen, for evaluating vendor search products and creating in-house search products. commercial tools like websitepulse make this type of data collection simple and easy. testing should be conducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semester for academic libraries. we further recommend that reviews of electronic resources add response time as an worldcat@auraria, and the slowest among the traditional catalogs. this sirsidynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ils; this accounts for much of the slowness (see figures 14 and 15). once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see figure 15), which is a connection to the third-party provider syndetic solutions, which provides cover art, a summary, an author biography, and a table of contents. while the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. also, as shown in item 14 and 15, usc’s homer uses the addthis service to add bookmarking enhancements to the catalog. total combined file size is 148.47 kb, with the bulk of the file size (80 kb) coming from the initial connection (item 1 in figure 15). ■■ conclusion an eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is figure 13. permalink screen shot for the record for the title hard lessons in homer, the university of southern california’s catalog figure 14. websitepulse webpage test bar graph results for homer, the university of southern california’s catalog figure 15. websitepulse webpage test table results for homer, the university of southern california’s catalog next-generation library catalogs | brown-sica, beall, and mchale 223 4. ibid. 5. margaret brown-sica. “playing tag in the dark: diagnosing slowness in library response time,” information technology & libraries 27, no. 4 (2008): 29–32. 6. xiaotian chen, “metalib, webfeat, and google: the strengths and weaknesses of federated search engines compared with google,” online information review 30, no. 4 (2006): 422. 7. brian mathews, “5 next gen library catalogs and 5 students: their initial impressions,” online posting, may 1, 2009, the ubiquitous librarian blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-genlibrary-catalogs-and-5-students-their-initial-impressions.html (accessed feb. 5, 2010) 8. andrew nagy and scott garrison, “next-gen catalogs are only part of the solution,” online posting. oct. 4, 2009, lauren’s library blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed feb. 5, 2010). 9. eitan frachtenberg, “reducing query latencies in web search using fine-grained parallelism,” world wide web 12, no. 4 (2009): 441–60. 10. travis k huang and fu fong-ling, “understanding user interface needs of e-commerce web sites,” behaviour & information technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed feb. 5, 2010). 11. nielsen, usability engineering, 135. 12. ibid. evaluation criterion. additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ acknowledgments the authors wish to thank shelley wendt, library data analyst, for her assistance in preparing the test data. references 1. jakob nielsen, usability engineering (san francisco: morgan kaufmann, 1994): 135. 2. ibid. 3. ibid. reducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. geographic information systems: tools for displaying in-library use data lauren h. mandel geographic information systems: tools for displaying in-library use data | mandel 47 in-library use data is crucial for modern libraries to understand the full spectrum of patron use, including patron self-service activities, circulation, and reference statistics. rather than using tables and charts to display use data, a geographic information system (gis) facilitates a more visually appealing graphical display of the data in the form of a map. giss have been used by library and information science (lis) researchers and practitioners to create maps that display analyses of service area populations and demographics, facilities space management issues, spatial distribution of in-library use of materials, planned branch consolidations, and so on. the “seating sweeps” method allows researchers and librarians to collect in-library use data regarding where patrons are locating themselves within the library and what they are doing at those locations, such as sitting and reading, studying in a group, or socializing. this paper proposes a gis as a tool to visually display in-library use data collected via “seating sweeps” of a library. by using a gis to store, manage, and display the data, researchers and librarians can create visually appealing maps that show areas of heavy use and evidence of the use and value of the library for a community. example maps are included to facilitate the reader’s understanding of the possibilities afforded by using giss in lis research. t he modern public library operates in a context of limited (and often continually reduced) funding where the librarians must justify the continued value of the library to funding and supervisory authorities. this is especially the case as more and more patrons access the library virtually, calling into question the relevance of the physical library. in this context, there is a great need for librarians and researchers to evaluate the use of library facility space to demonstrate that the physical library is still being used for important social and educational functions. despite this need, no model of public library facility evaluation emphasizes the ways patrons use library facilities. the systematic collection of in-library use data must go beyond traditional circulation and reference transactions to include self-service activities, group study and collaboration, socializing, and more. geographic information systems (giss) are beginning to become deployed in library and information science (lis) research as a tool for graphically displaying data. an initial review of the literature has yielded studies where a gis has been used in analyzing service area populations through u.s. census data;1 sitting facility locations;2 managing facilities, including spatial distribution of in-library book use and occupancy of library study space;3 and planning branch consolidations.4 these uses of gis are not mutually exclusive; studies have combined multiple uses of giss.5 also, giss have been proposed as viable tools for producing visual representations of measurements of library facility use.6 these studies show the capabilities of a gis for storing, managing, analyzing, and displaying in-library use data and the value of gisproduced maps for library facility evaluations, in-library use research, and library justification. n research purpose observing and measuring the use of a library facility is a crucial step in the facility evaluation process. the library needs to understand how the facility is currently being used in order to justify the continued financial support necessary to maintain and operate it. understanding how the facility is used can also help librarians identify hightraffic areas of the library that are ideal locations to market library services and materials. this understanding cannot be reached by analyzing circulation and reference transaction data alone; it must include in-library use measures that account for all ways patrons are using the facility. the purpose of this paper is to suggest a method by which to observe and record all uses of a library facility during a sampling period, the so-called “seating sweep” performed by given and leckie, and then to use a gis to store, manage, and display the collected data on a map or series of maps that graphically depict library use.7 n significance of facility evaluation facility evaluation is a topic of vital importance in all fields, but this is especially true of a field such as public librarianship where funding is often a source of concern.8 in times of economic instability, libraries can benefit from the ability to identify uses of existing facilities and employ this information to justify the continued operation of the library facility. also, knowing which areas of the library are more frequently used than others can help lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. 48 information technology and libraries | march 2010 librarians determine where to place displays of library materials and advertisements of library services. for a library to begin to evaluate patron use and how well the facility meets users’ needs, there must be an understanding of what users need from the library facility.9 to determine those needs, it is vital that library staff observe the facility while it is being used. this observation can be applied to the facility evaluation plan to justify the continued operation of the facility to meet the needs of the library service population. understanding how people use the public library facility beyond traditional measures of circulation statistics and reference transactions can lead to new theories of library use, an area of significant research interest for lis. additionally, the importance of this work transcends lis because it applies to other government-funded community service agencies as well. for example, recreation facilities and community centers could also benefit from a customer-use model that incorporates measures of the true use of those facilities. n literature review although much has been written on the use of library facilities, little of the research includes studies of how patrons actually use existing public library facilities and whether facilities are designed to accommodate this use.10 rather, much of the research in public library facility evaluation has focused on collection and equipment space needs,11 despite the user-oriented focus of public library accountability models.12 recent research in library facility design is beginning to reflect this focus,13 but additional study would be useful to the field. use of gis is on the rise in the modern technological world. a gis is a computer-based tool for compiling, storing, analyzing, and displaying data graphically.14 usually this data is geospatial in nature, but a gis also can incorporate descriptive or statistic data to provide a richer picture than figures and tables can. although gis has been around for half a century, it has become increasingly more affordable, allowing libraries and similar institutions to consider using a gis as a measurement and analysis tool. giss have started being used in lis research as a tool for graphically displaying library data. one fruitful area has been the mapping of user demographics for facility planning purposes,15 including studies that mapped library closures.16 mapping also can include in-library use data,17 in which case a gis is used to overlay collected in-library use data on library floor plans. this can offer a richer picture of how a facility is being used than traditional charts and tables can provide. using a gis to display library service area population data adkins and sturges suggest libraries use a gis-based library service area assessment as a method to evaluate their service areas and plan library services to meet the unique demographic demands of their communities.18 they discuss the methods of using gis, including downloading u.s. census tiger (topologically integrated geographic encoding and referencing) files, geocoding library locations, delineating service areas by multiple methods, and analyzing demographics. a key tenet of this approach is the concept that public libraries need to understand the needs of their patrons. this is a prevailing concept in the literature.19 prieser and wang, in reporting a method used to create a facilities master plan for the public library of cincinnati and hamilton county, ohio, offer a convincing argument for combining gis and building performance evaluation (bpe) methods to examine branch facility needs and offer individualized facilities recommendations.20 like other lis researchers,21 preiser and wang suggest a relationship between libraries and retail stores, noting the similar modern trends of destination libraries and destination bookstores. they also acknowledge the difficulty in completing an accurate library performance assessment due to the multitude of activities and functions of a library. their method is a combination of a gis-based service area and population analysis with a bpe that includes staff and user interviews and surveys, direct observation, and photography. the described multimethod approach offers a more complete picture of a library facility’s performance than traditional circulation-based evaluations. further use of giss in library facility planning can be seen from a study comparing proposed branches by demographic data that has been analyzed and presented through a gis. hertel and sprague describe research that used a gis to conduct geospatial analysis of u.s. census data to depict the demographics of populations that would be served by two proposed branch libraries for a public library system in idaho.22 a primary purpose of this research is to demonstrate the possible ways public libraries can use gis to present visual and quantitative demographic analyses of service area populations. hertel and sprague identify that public libraries are challenged to determine which public they are serving and the needs of that population, writing that “libraries are beginning to add customer-based satisfaction as a critical component of resource allocation decisions” and need the help of a gis to provide hard-data evidence in support of staff observations.23 this evidence could take the form of demographic data, as discussed by hertel and sprague, and also could incorporate in-library use data to present a fuller picture of a facility’s use. geographic information systems: tools for displaying in-library use data | mandel 49 using gis to display in-library use data xia conducted several studies in which he collected libraryuse data and mapped that data via a gis. in one study designed to identify the importance of space management in academic libraries, xia suggests applications of giss in library space management, particularly his tool integrating library floor plans with feature data in a gis.24 he explains that a gis can overcome the constraints of drafting and computer automated design tools, such as those in use at chico meriam library at california state university and at the michigan state university main library. for example, giss are not limited to space visualization manipulation, but can incorporate user perceptions, behavior, and daily activities, all of which are important data to library space management considerations and in-library use research. xia also reviews the use of gis tools that incorporate hospital and casino floor plans, noting that library facilities are as equally complex as hospitals and casinos; this is a compelling argument that academic libraries should consider the use of a gis as a space management tool. in another study, xia uses a gis to visualize the spatial distribution of books in the library in an attempt to establish the relationship between the height of bookshelves and the in-library use of books.25 this study seeks to answer the question of how the location of books on shelves of different heights could influence user behavior (i.e., patrons may prefer to browse shelves at eye level rather than the top and bottom shelves). what is of interest here is xia’s use of a gis to spatially represent the collected data. xia remarks that a gis “is suitable for assisting in the research of in-library book use where library floor layouts can be drawn into maps on multipledimensional views.”26 in fact, xia’s graphics depict the use of books by bookshelf height in a visual manner that could not be achieved without the use of a gis. similarly, a gis can be used to spatially represent the collected data in an in-library use study by overlaying the data onto a representation of the library floor plan. in a third project, xia measures study space use in academic libraries as a metric of user satisfaction with library services.27 he says that libraries need to evaluate space needs on case-by-case basis because every library is unique and serves a unique population. therefore, to observe the occupancy of study areas in an academic library, xia drew the library’s study facilities (including furniture) in a gis. he then observed patrons’ use of the facilities and entered the observation data into the gis to overlay on maps of the study areas. there are several advantages of using gis in this way: spatial databases can store continuing data sets, the system is powerful and flexible for manipulating and analyzing the spatial dataset, there are enhanced data visualization capabilities, and maps and data become interactive. conclusions drawn from the literature a gis is a tool gaining momentum in the world of lis research. giss have been used to conduct and display service area population assessments,28 propose facility locations,29 and plan for and measure branch consolidation impacts and benefits.30 giss also have been used to graphically represent in-library use for managing facility space allocation, mapping in-library book use, and visualizing the occupancy of library study space.31 additionally, giss have been used in combination studies that examine library service areas and facility location proposals.32 these uses of giss are only the beginning; a gis can be used to map any type of data a library can collect, including all measures of in-library use. additionally, gis-based data analysis and display complements the focus in library-use research on gathering data to show a richer picture of a facility’s use and the focus in library facility design literature on building libraries on the basis of community needs.33 n in-library use research that would benefit from spatial data displays unobtrusive observational research offers a rich method for identifying and recording the use of a public library facility. a researcher could obtain a copy of a library’s floor plan, predetermine sampling times during which to “sweep” the library, and conduct the sweeps by marking all patrons observed on the floor plan.34 this data then could be entered into a gis database for spatial analysis and display. specific questions that could be addressed via such a method include the following: n what are all the ways in which people are using the library facility? n how many people are using traditional library resources, such as books and computers? n how many people are using the facility for other reasons, such as relaxation, meeting friends, and so on? n do the ways in which patrons use the library vary by location within the facility (e.g., are the people using traditional library resources and the people using the library for other reasons using the same areas of the library or different areas)? n which area(s) of the library facility receive the highest level of use? it is hoped that answers to these questions, in whole or in part, could begin to offer a picture of how a library facility is currently being used by library patrons. to better view this picture, the data recorded from the observational research could be entered into a gis to 50 information technology and libraries | march 2010 overlay onto the library floor plan in a similar manner as xia’s use of a gis to display occupancy of library study space.35 this spatial representation of the data should facilitate greater understanding of the actual use of the library facility. instead of a library presenting tables and graphs of library use, it would be able to produce illustrative maps that would help explain patterns of use to funding and supervising authorities. these maps would not require expensive proprietary gis packages; the examples provided in this paper were created using the free, open-source mapwindow gis package. example using gis to display in-library use data for this paper, i produced example maps on the basis of fictional in-library use data. these maps were created using mapwindow gis software along with microsoft excel, publisher, and paint (see figure 1 for a diagram of this process). mapwindow is an open-source gis package that is easy to learn and use, but its layout and graphic design features are limited compared to the more expensive and sophisticated proprietary gis packages.36 mapwindow files are compatible with the proprietary packages, so they could be imported into other gis packages for finishing. for this paper, however, the goal was to create simple maps that a novice could replicate. therefore publisher and paint were used for finalizing the maps, instead of a sophisticated gis package. it was relatively easy to create the maps. first, i drew a sample floor plan of a fictional library computer lab in excel and imported it into mapwindow as a jpeg file. i then overlaid polygons (shapes that represent area units such as chairs and tables) onto the floor plan and saved two shapefiles, one for tables and one for computers. a shapefile is a basic storage file used in most gis packages. for each of those shapefiles i created an attribute table (basically, a linked spreadsheet) using fictitious data representing use of the tables and computers at 9 and 11 a.m. and 1, 3, 5, and 7 p.m. on a sample day. the field calculator generated a final column summing the total use of each table and computer for the fictitious sample day. i then created maps depicting the use of both tables and computers at each of the sample time periods (see figure 2) and for the total use (see figure 3). benefits of gis-created displays for library managers the maps presented here are not based on actual data, but are meant to demonstrate the capabilities of giss for spatially representing the use of a library facility. this could be done on a grander scale using an entire library floor plan and data collected during a longer sample period (e.g., a full week). these maps can serve several purposes for figure 1. process diagram for creating the sample maps figure 2. example maps depicting use of tables and computers in a fictional library computer lab, by hour geographic information systems: tools for displaying in-library use data | mandel 51 library managers, specifically regarding the marketing of library services and the justification of library funding. mapping data obtained from library “sweeps” can help identify the popularity of different areas of the library at different times of the day, different days of the week, or different times of the year. once the library has identified the most popular areas, this information can be used to market library materials and services. for example, a highly populated area would be an ideal location over which to install ceiling-mounted signs that the library could use for marketing services and programs. or the library could purchase a book display table similar to those used in bookstores and install it in the middle of a frequently populated area. the library could stock the table with seasonally relevant books and other materials (e.g., tax guidebooks in march and april) and track the circulation of these materials to determine the degree to which placement on the display table resulted in increased borrowing of those materials. in addition to helping the library market its materials and services, mapping in-library use can provide visual evidence of the library’s value. public libraries often rely on reference and circulation transaction data, gate counts, and programming attendance statistics to justify their existence. these measures, although valuable and important, do not include many other ways that patrons use libraries, such as sitting and reading, studying, group work, and socializing. during “seating sweeps,” the observers can record any and all uses they observe, including any that may not have been anticipated. all of these uses could then be mapped, providing a richer picture of how a public library is used and stronger justification of the library’s value. these maps may be easier for funding and supervising authorities to understand than textual explanations or graphs and charts of statistical analyses. n conclusion from a review of the literature, it is clear that giss are increasingly being used in lis research as data-analysis and display tools. giss are being used to analyze patron and materials data as well as studies combining combined multiple uses of giss. patron analysis has included service-area-population analysis and branch-consolidation planning. analysis of library materials has been used for space management, visualizing the spatial distribution of in-library book use, and visual representation of facility-use measurements. this paper has proposed collecting in-library use data according to given and leckie’s “seating sweeps” method and visually displaying that data via a gis. examples of such visual displays were provided to facilitate the reader’s understanding of the possibilities afforded by using a gis in lis research, as well as the scalable nature of the method. librarians and library staff can produce maps similar to the examples in this paper with minimal gis training and background. the literature review and example figures offered in this paper show the capabilities of giss for analyzing and graphically presenting library-use data. giss are tools that can facilitate library facility evaluations, in-library use research, and library valuation and justification. references 1. denice adkins and denyse k. sturges, “library service planning with gis and census data,” public libraries 43, no. 3 (2004): 165–70; karen hertel and nancy sprague, “gis and census data: tools for library planning,” library hi tech 25, no. 2 (2007): 246–59; wolfgang f. e. preiser and xinhao wang, “assessing library performance with gis and building evaluation methods,” new library world 107, no. 1224–25 (2006): 193–217. figure 3. example map depicting total use of tables and computers in a fictional library computer lab for a sample day 52 information technology and libraries | march 2010 2. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 3. jingfeng xia, “library space management: a gis proposal,” library hi tech 22, no. 4 (2004): 375–82; xia, “using gis to measure in-library book-use behavior,” information technology & libraries 23, no. 4 (2004): 184–91; xia, “visualizing occupancy of library study space with gis maps,” new library world 106, no. 1212–13 (2005): 219–33. 4. preiser and wang, “assessing library performance.” 5. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 6. preiser and wang, “assessing library performance”; xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 7. lisa m. given and gloria j. leckie, “‘sweeping’ the library: mapping the social activity space of the public library,” library & information science research 25, no. 4 (2003): 365–85. 8. “jackson rejects levy to reopen libraries,” american libraries 38, no. 7 (2007): 24–25; “may levy set for jackson county libraries closing in april,” american libraries 38, no. 3 (2007): 14; “tax reform has florida bracing for major budget cuts,” american libraries 38, no. 8 (2007): 21. 9. anne morris and elizabeth barron, “user consultation in public library services,” library management 19, no. 7 (1998): 404–15; susan l. silver and lisa t. nickel, surveying user activity as a tool for space planning in an academic library (tampa: univ. of south florida library, 2002); james simon and kurt schlichting, “the college connection: using academic support to conduct public library services,” public libraries 42, no. 6 (2003): 375–78. 10. given and leckie, “‘sweeping’ the library”; christie m. koontz, dean k. jue, and keith curry lance, “collecting detailed in-library usage data in the u.s. public libraries: the methodology, the results and the impact,” in proceedings of the third northumbria international conference on performance measurement in libraries and information services (newcastle, uk: university of northumbria, 2001): 175–79; koontz, jue, and lance, “neighborhood-based in-library use performance measures for public libraries: a nationwide study of majorityminority and majority white/low income markets using personal digital data collectors,” library & information science research 27, no. 1 (2005): 28–50. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: ala, 2007); anders c. dahlgren, public library space needs: a planning outline (madison, wis.: department of public instruction, 1988); william w. sannwald and robert s. smith, eds., checklist of library building design considerations (chicago: ala, 1988). 12. brenda dervin, “useful theory for librarianship: communication, not information,” drexel library quarterly 13, no. 3 (1977): 16–32; morris and barron, “user consultation”; preiser and wang, “assessing library performance”; simon and schlichting, “the college connection”; norman walzer, karen stott, and lori sutton, “changes in public library services,” illinois libraries 83, no. 1 (2001): 47–52. 13. bradley wade bishop, “use of geographic information systems in marketing and facility site location: a case study of douglas county (colo.) libraries,” public libraries 47, no. 5: 65–69; david jones, “people places: public library buildings for the new millennium,” australasian public libraries & information services 14, no. 3 (2001): 81–89; nolan lushington, libraries designed for users: a 21st century guide (new york: neal-schuman, 2001); shannon mattern, “form for function: the architecture for new libraries,” in the new downtown library: designing with communities (minneapolis: univ. of minnesota pr., 2007), 55–83. 14. united nations, department of economic and social affairs, statistics division, handbook on geographical information systems and mapping (new york: united nations, 2000). 15. adkins and sturges, “library service planning”; bishop, “use of geographic information systems”; hertel and sprague, “gis and census data”; christie koontz, “using geographic information systems for estimating and profiling geographic library market areas,” in geographic information systems and libraries: patrons, maps, and spatial information, ed. linda c. smith and mike gluck (urbana–champaign: univ. of illinois pr., 1996): 181–93; preiser and wang, “assessing library performance.” 16. christie m. koontz, dean k. jue, and bradley wade bishop, “public library facility closure: an investigation of reasons for closure and effects on geographic market areas,” library & information science research 31, no. 2 (2009): 84–91. 17. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 18. adkins and sturges, “library service planning.” 19. bishop, “use of geographic information systems”; jones, “people places”; koontz, jue, and lance, “collecting detailed inlibrary usage data”; koontz, jue, and lance, “neighborhoodbased in-library use”; morris and barron, “user consultation”; simon and schlichting, “the college connection”; walzer, stott, and sutton, “changes in public library services.” 20. preiser and wang, “assessing library performance.” 21. given and leckie, “‘sweeping’ the library;” christie m. koontz, “retail interior layout for libraries,” marketing library services 19, no. 1 (2005): 3–5. 22. hertel and sprague, “gis and census data.” 23. ibid., 247. 24. xia, “library space management.” 25. xia, “using gis to measure.” 26. ibid., 186. 27. xia, “visualizing occupancy.” 28. adkins and sturges, “library service planning”; hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 29. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 30. koontz, jue, and bishop, “public library facility closure”; preiser and wang, “assessing library performance.” 31. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 32. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 33. given and leckie, “‘sweeping’ the library”; koontz, jue, and lance, “collecting detailed in-library usage data”; koontz, jue, and lance, “neighborhood-based in-library use”; silver and nickel, surveying user activity; jones, “people places”; lushington, libraries designed for users. 34. given and leckie, “‘sweeping’ the library.” 35. xia, “visualizing occupancy.” 36. for more information or to download mapwindow gis, see http://www.mapwindow.org/ editorial | truitt 3 w ithin the last few months, two provocative books have been published that take different approaches to the question of how we learn in the always-on, always-connected electronic environment of “screens.” while neither is specifically directed at librarians, i think both deserve to be read and discussed widely in our community. ■■ the shallows the first, the shallows: what the internet is doing to our brains (norton, 2010), by nicholas carr, is an expanded version of his article “is google making us stupid?” published in the july/august 2008 issue of atlantic monthly and discussed in this space soon after.1 carr’s arguments in the shallows will be familiar to those who read his earlier article, but they are more thoroughly developed in his book and worth summarizing here. carr’s thesis is that use of connective technology—the internet and the web—is leading to a remapping of cognitive reading and thinking skills, and a “shallowing” of these mental faculties: over the last few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. . . . i’m not thinking the way i used to think. i feel it most strongly when i’m reading. i used to find it easy to immerse myself in a book or a lengthy article. . . . that’s rarely the case anymore. (5) the problem, as carr goes on to describe at some length, chronicling in detail the results of years of neurological investigations, is that the brain is “plastic.” “virtually all of our neural circuits—whether they’re involved in feeling, seeing, hearing, moving, thinking, learning, perceiving, or remembering—are subject to change.” and one of the things that is changing them the most drastically today is our growing reliance on digital information. the paradox is that as we repeat an activity—surfing the web and clicking on links, rather than engaging with linear texts, for example—chemically induced synapses cause us to want to continue the new activity, strengthening those links (34). this quality of plastic neural circuits that can be remapped, when combined with the “ecosystem of interruption technologies” of the internet and the web (e.g., in-text hyperlinks, e-mail and rss alerts, text messaging, twitter, multiple widgets, etc.) is resulting in what carr argues is a growing inability or unwillingness to engage with and reflect deeply upon extended text (91).2 as carr puts it, the linear, literary mind . . . [that has] been the imaginative mind of the renaissance, the rational mind of the enlightenment, the inventive mind of the industrial revolution, even the subversive mind of modernism . . . may soon be yesterday’s mind. (10) there is much more. carr offers pointed critiques of major internet players and the roles they play in facilitating and exploiting the remapping of our neural circuits. google, whose “profits are tied directly to the velocity of people’s information intake,” is to carr “in the business of distraction” (156–57). the google book initiative “shouldn’t be confused with the libraries we’ve known until now. it’s not a library of books. it’s a library of snippets. . . . the strip-mining of ‘relevant content’ replaces the slow excavation of meaning” (166). ultimately, for carr, it’s about who is controlling whom. while the internet may permit us to better perform some functions—search, for example—“it poses a threat to our integrity as human beings . . . we program our computers and thereafter they program us” (214). put another way, “the computer screen bulldozes our doubts with its bounties and conveniences. it is so much our servant that it would seem churlish to notice that it is also our master” (4). ■■ hamlet’s blackberry perhaps less familiar than carr’s work is william powers’ hamlet’s blackberry: a practical philosophy for building a good life in the digital age (harpercollins 2010). powers, a writer whose work has appeared in the washington post, the new york times, the new republic, and elsewhere, describes the influence of digital technology (or “screens,” to use his shorthand)3 and connectedness on our lives: in the last few decades, we’ve found a powerful new way to pursue more busyness: digital technology. computers and smart phones are often pitched as solutions to our stressful, overextended lives. . . . but at the same time, they link us more tightly to all the sources of our busyness. our screens are conduits for everything that keeps us hopping—mandatory and optional, worthwhile and silly. . . . marc truitteditorial: “the air is full of people” marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2011 if not yet a general consensus, that people are coming to experience and understand these costs. finally, they also make the point that things need not continue on their present course. i can imagine that if we in libraries take carr and powers seriously, there might be significant implications for service models and collections practices. both books have been reviewed in all the usual mainstream places. remarkably though, to me—and excluding a scant few discussion list threads such as that on web4lib several years ago—i’ve seen no discussion in the usual professional venues of their implications where libraries are concerned. perhaps i’m simply not reading the “right” weblogs or discussion lists. i’m not under the illusion that libraries or librarians can by themselves alter our rush toward the “shallows.” still, given our eagerness to discuss how we extend the reach of “screens” in libraries—whether in the form of learning commons, wireless access, mobile-friendly websites, clearing stacks of “tree-books” in favor of e-books, etc.—would it not be reasonable to think that we should show as much concern about the consequences of such activities, and even some interest in providing possible remedial alternatives? one of my favorite library spaces in college was the linonia and brothers reading room in yale’s sterling memorial library (see a photo of the reading room at http://images.library.yale.edu/madid/oneitem.aspx ?id=1772930). its dark oak paneling, built-in bookshelves, overstuffed leather easy chairs, cozy alcoves, toasty, footwarming steam radiators, and stained-glass windows overlooking a quiet courtyard represented the epitome of the nineteenth-century “gentleman’s library” and encouraged the sort of deep reading and contemplation that are becoming so rare in our institutions today. i spent many hours there, reading, thinking, dreaming—and yes, catnapping too. i haven’t visited the “l&b” in years; i hope it is still the way i so fondly recall it. over the past few years, as we’ve considered the various aspects of the library-as-space question, we’ve created all manner of collaborative, group-focused, überconnected learning spaces. we’ve also created bookfree spaces (to say nothing of book-free “libraries”), food-friendly spaces, quiet and cell-phone-free spaces, and a host of others of which i’m sure i haven’t thought. so, in an attempt to get us thinking about what carr ’s and powers’ books might mean for libraries, here’s a crazy idea to start us off: how about a screen-free space for deep reading and contemplation? it should be very low-tech: no mobiles, no laptops, no desktops, no networks, no clickety-clack of keys, no chimes of incoming e-mail and tweets, no unearthly glow of monitors. no food, drink, or group-study areas, either. just a quiet, inviting, comfortable space for individual reading and the goal is no longer to be “in touch” but to erase the possibility of ever being out of touch. to merge, to live simultaneously with everyone, sharing every moment, every perception, thought, and action via our screens. even the places where we used to go to get away from the crowd and the burdens it imposes on us are now connected. the simple act of going out for a walk is completely different today from what it was fifteen years ago. whether you’re walking down a big-city street or in the woods outside a country town, if you’re carrying a mobile device with you, the global crowd comes along. . . . the air is full of people. (14–15) drawing inspiration and analogy from a list of philosophers and other historical and literary figures beginning with plato and ending with mcluhan, powers describes seven practical approaches, tools, and techniques for disconnecting from our screen-driven life: ■■ seek physical distance (plato) ■■ seek intellectual and emotional distance (seneca) ■■ hope for devices that might allow us to customize our degree of connectedness (gutenberg) ■■ consider older, low-tech tools as alternatives where possible (shakespeare via hamlet) ■■ create positive rituals (ben franklin) ■■ create a “walden zone” refuge (thoreau) ■■ be aware of and take personal control from technology by being aware of that technology (mcluhan) powers then reviews how he and his family used these techniques to regain the sense of control and depth they felt they’d lost to screens. in the past several months, i’ve tried a couple myself. i no longer carry a blackberry unless i’m traveling out of town. i avoid e-mail and the internet completely on saturdays (my “internet sabbath”). the effect of these two small and easily achieved changes has been little short of liberating, providing space to think and reflect without the distraction of always-on connectedness. walking my lab seamus has become a special pleasure! ■■ bringing libraries into the picture so, what do carr’s and powers’ theses mean for libraries, and what do they mean in particular for those of us who provide technology solutions for libraries? they remind us that there is a very real human cost to the technology of screens and always-on connectedness that have become our stock-in-trade in recent years. as well, they provide convincing evidence that there is a growing awareness, editorial | truitt 5 references and notes 1. carr’s atlantic monthly article appeared in volume 301 (july/aug. 2008) and can be found at http://www.theatlantic . c o m / m a g a z i n e / a rc h i v e / 2 0 0 8 / 0 7 / i s g o o g l e m a k i n g u s -stupid/6868/ (accessed jan. 14, 2011); my ital column on the topic is at http://www.ala.org/ala/mgrps/divs/lita/ ital/272008/2703sep/editorial_pdf.cfm (accessed jan. 14, 2011). 2. the term “ecosystem of interruption technologies” belongs to cory doctorow. 3. powers uses the term “screens” to describe “the connective digital devices that have been widely adopted in the last two decades, including desktop and notebook computers, mobile phones, e-readers, and tablets” (1). thought. would some of our patrons adopt it? i’m willing to bet that they would. do we not owe them the same commitment to service that we’ve worked so hard to provide to those who wish to be collaborative and “always-on”? absolutely. no, we can’t change the world or stop the march of the screens. but perhaps, as with powers’ “walden zone,” we can start by providing a close-at-hand safe harbor for those of our patrons seeking refuge from the “always-on” world of screens. 102 information technology and libraries | september 2010 lita committees and interest groups are being asked to step up to the table and develop action plans to implement the strategies the lita membership have identified as crucial to the association’s ongoing success. members of the board are liaisons to each of the committees, and there is a board liaison to the interest groups. these individuals will work with committee chairs, interest group chairs, and the membership to implement lita’s plan for the future. the committee and interest group chairs are being asked to contribute those actions plans by the 2011 ala midwinter meeting. they will be compiled and made available to all lita and ala members for their use through the lita website (http://lita.org) and ala connect (http://connect.ala.org). what is in it for you? lita is known for its leadership opportunities, continuing education, training, publications, expertise in standards and information policy, and knowledge and understanding of current and cuttingedge technologies. lita provides you with opportunities to develop those leadership skills that you can use in your job and lifelong career. the skills working within a group of individuals to implement a program, influence standards and policy, collaborate with other ala divisions, and publish can be taken home to your library. your participation documents your value as an employee and your commitment to lifelong learning. in today’s work environment, employers look for staff with proven skills who have contributed to the good of the organization and the profession. lita needs your participation in developing and implementing continuing education programs, publishing articles and books, and illustrating by your actions why others want to join the association. how can you do that? volunteer for a committee, help develop a continuing education program, write an article, write a book, role model for others with your lita participation, and recruit. what does your association gain? a solid structure to support its members in accomplishing the mission, vision, and strategic plan they identified as core for years to come. look for opportunities to participate and develop those skills. we will be working with committee and interest group chairs to develop meeting management tool kits over the next year, create opportunities to participate virtually, identify emerging leaders of all types, collaborate with other divisions, and provide input on national information policy and standards through ala’s office for information technology policy and other similar organizations. if you want to be involved, be sure to let lita committee and interest group chairs, the board, and your elected officers know. c loud computing. web 3.0 or the semantic web. google editions. books in copyright and books out of copyright. born digital. digitized material. the reduction of stanford university’s engineering library book collection by 85 percent. the publishing paradigm most of us know, and have taken for granted, has shifted. online databases came and we managed them. then cd-roms showed up and mostly went away. and, along came the internet, which we helped implement, use, and now depend on. how we deal with the current shifts happening in information and technology during the next five to ten years will say a great deal about how the library and information community reinvents itself for its role in the twenty-first century. this shift is different, and it will create both opportunities and challenges for everyone, including those who manage information and those who use it. as a reflection of the shifts in the information arena, lita is facing its own challenges as an association. it has had a long and productive role in the american library association (ala) dating back to 1966. the talent among the association members is amazing, solid, and a tribute to the individuals who belong to and participate in lita. lita’s members are leaders to the core and recognized as standouts within ala as they push the edge of what information management means, and can mean. for the past three years, lita members, the board, and the executive committee have been working on a strategic plan for lita. that process has been described in michelle frisque’s “president’s message” (ital v. 29, no. 2) and elsewhere. the plan was approved at the 2010 ala annual conference in washington, d.c. a plan is not cast in concrete. it is a dynamic, living document that provides the fabric that drives the association. why is this process important now more than ever? we are all dealing with the current recession. libraries are retrenching. people face challenges participating in the library field on various levels. the big information players on the national and international level are changing the playing field. as membership, each of us has an opportunity to affect the future of information and technology locally, nationally, and internationally. this plan is intended to ensure lita’s role as a “go to” place for people in the library, information, and technology fields well into the twenty-first century. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starrpresident’s message: moving forward redesigning research guides: lessons learned from usability testing at the university of memphis article redesigning research guides lessons learned from usability testing at the university of memphis jessica mcclure, carl hess, and david marsicano information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.15535 jessica mcclure (jmcclre3@memphis.edu) is the virtual instruction librarian, university of memphis. carl hess (carl.hess@memphis.edu) is the undergraduate success librarian, university of memphis. david marsicano (dmmrscno@memphis.edu) is a library assistant ii, university of memphis. © 2023. abstract at the university of memphis, a team of librarians and library staff formed the research guides redesign team (rgrt) to redesign, organize, and evaluate the university libraries’ (ul) research guides. the purpose of the project was to ensure that the new design of the research guides homepage was intuitive to use. while it is impossible to ensure absolute usability for every user, this usability study attempts to eradicate the most common interface issues in community experiences at the university of memphis. the rgrt conducted usability testing to evaluate the effectiveness of the new standardized format, grouped headings, and the appearance of the interface. the rgrt worked within the limitations of springshare’s software to create the design and then chose five users to complete various task scenarios. upon analysis of the users’ ability to complete the tasks, the rgrt discovered that overall, the design was effective, but they did make a few minor changes. this study describes the process and includes the original design, the new design, edits made after usability testing was conducted, and plans for future testing. introduction in the spring of 2021, the research guides redesign team (rgrt) assembled to establish a new workflow for maintaining and designing research guides at the university of memphis. previously, all university library (ul) faculty (hereafter “librarians”) were tasked with creating and maintaining research guides for academic programs and courses in their liaison areas in addition to their other duties. all librarians took part in the liaison program but did not have extensive training in guide creation. as a result, many guides consisted of lists of resources without instructions on how to use them, did not encourage information-seeking behavior, or did not cover basic information literacy skills.1 while some librarians had administrative privilege over the research guides software (springshare’s libguides product), no one was tasked with reviewing or evaluating the guides holistically. consequently, librarians were creating guides for their liaison departments without considering whether the information they included was covered in existing guides, and many were not regularly updated. the rgrt was created to solve these issues. it allowed for a more centralized workflow for creating and maintaining guides. it comprised a smaller group of librarians and library staff volunteering to take the baton from library liaisons in creating, updating, and deleting guides based on the needs of all patrons at the university of memphis. the role of liaison librarians in the process changed to providing content expertise for their assigned departments’ guides as needed. mailto:jmcclre3@memphis.edu mailto:carl.hess@memphis.edu mailto:dmmrscno@memphis.edu information technology and libraries september 2023 redesigning research guides 2 mcclure, hess, and marsicano the rgrt selected the redesign of the research guides homepage as the first project. the homepage (illustrated in figs. 1, 2, and 3) had become an unwieldy list, organized by a mix of topic headings and headings reflecting university college and school names. users had to scroll through this list to find guides, which the rgrt decided was clunky and inefficient. there was a search bar for locating guides, but it was in the upper right of the page away from the guide list (see fig. 1–3). despite this long list of guides, users were often choosing between very similar guides with redundant content and long lists of undifferentiated resources, and if a u ser found a guide, it was not guaranteed to be up to date and often did not follow a consistent format. improving this experience became the focus of the rgrt. while the rgrt considered starting with usability testing on the old homepage, it determined this was not necessary. the guides included many of the common usability issues in libguides as described by thorngate and hoden including “inconsistent design from page to page and from guide to guide,” “cluttered pages lacking a focal point,” and “too much content, not appropriately scoped to the task at hand”; and anecdotally, librarians and staff in the past had expressed frustrations with trying to find useful guides to share with patrons while navigating through the clutter of the homepage.2 further, the rgrt began the project of homepage and guide redesign early in the covid-19 pandemic. this would have required recruiting and running usability testing remotely, since the campus was primarily engaging in virtual instruction, adding difficulty to the project. the pandemic virtual instruction environment also gave the rgrt an added sense of urgency to quickly design a more usable research guide environment, since the ul’s digital learning objects and digital services had become the primary tool for patrons to interact with the ul. performing virtual usability testing was a less efficient use of time when a wide variety of usability issues with the old setup was already known. the homepage redesign project took place simultaneously with a project to reorganize the subject guides, consolidate related academic programs to limit the amount of redundant content, focus more on teaching users how to use the resources recommended in the guides, and create a more manageable number of guides both for users and for the rgrt to update. all guides now have a consistent format and include instructional materials (videos, tutorials, etc.) to teach users how to use the resources in each guide. this study describes the creation and testing conducted on the new design of the r esearch guides homepage and the redesigned subject guides. to assess the success of the rgrt’s work on the new homepage, the researchers’ objectives for this study were to: • identify whether users could effectively navigate the new homepage structure, inclu ding categorization and search features. • determine whether the new structure allowed users to identify useful resources with a minimum of attempts and clicks. • demonstrate that the updated design facilitated and encouraged users’ informationseeking behavior. information technology and libraries september 2023 redesigning research guides 3 mcclure, hess, and marsicano figure 1. the research guides homepage before the redesign, pt. 1. information technology and libraries september 2023 redesigning research guides 4 mcclure, hess, and marsicano figure 2. the research guides homepage before the redesign, pt. 2. information technology and libraries september 2023 redesigning research guides 5 mcclure, hess, and marsicano figure 3. the research guides homepage before the redesign, pt. 3. literature review online learning is an established format of higher education and there is an increase in the number of students attending classes online.3 this growth has implications for the use of library digital materials. a primary research group study of online library services revealed that 10 out of 37 academic libraries reported that distance students utilized research guides and tutorials more than traditional students.4 another stated the use of research guides increased by 56 percent in 2020.5 colorado christian university also used the covid-19 pandemic as an opportunity to redesign its websites and research guides.6 ghaphery and white’s study states that “libguides are as commonplace as books” in academic libraries, reiterating a need for gathering more usage statistics on them.7 based on this data, increasing the effectiveness and usefulness of these library digital materials is imperative to expand the scope of the library’s reach to all users, not just those on ground. it is clear research guides are one of the primary ways academic libraries share their resources with the community and that ongoing maintenance and attention is essential, particularly to online users. the authors of this study wanted to know how topic-based navigation would work for the university of memphis and wanted to conduct usability testing based on this design. the value of usability testing in libraries is well documented.8 at california state university, vargas discovered that using a topic-based navigation design is user-friendly and helps patrons “get used to” the site layout.9 furthermore, polger observed that the website and not the building is the user’s first interaction with the space and is often deemed the “face” of the library.10 designers of library web pages must organize information in a way that is intuitive for all users. this includes faculty, staff, undergraduate and graduate students, visiting researchers, and those with disabilities. while it is an impossible task to ensure absolute usability for every user, this usability study attempts to eradicate the most common interface issues in community experiences at the university of memphis. information technology and libraries september 2023 redesigning research guides 6 mcclure, hess, and marsicano redesigning research guides as a more holistic way of sharing resources with faculty and students is an effective way of designing. in the past, many research guides were lists of key reference sources; catalogs for finding books, theses, and dissertations; and periodical databases for finding journal articles and news articles—without much guidance on how to use them.11 similarly, the authors of a five-step usability study with research guides discovered that a significant challenge was that the “purpose and nature of research guides was not readily evident to users” and the library jargon used was confusing to users as well.12 an example of a more holistic design would be one guiding students through the research process.13 replacing library vocabulary with more commonly used language is also recommended.14 bowen et al. stated, “consequently, the questions of what to put on a guide and how best to arrange that material have driven an entire research agenda for the better part of a decade.” their study has similar objectives to this one regarding the ease of use and navigation menus. they elected to use an a/ b method of usability testing, presenting students with two different versions of the same page. their study was an excellent example of providing a format that would assist users with comprehending the content instead of having to understand the design itself.15 for this study, the rgrt adopted the five-user assumption commonly practiced among user experience professionals. the five-user assumption posits that, as you run usability testing, each additional user after the first will find usability problems already discovered by an earlier user in addition to discovering new problems. after the fifth user, the amount of repeat problems found will be so much greater than the number of new problems discovered as to render further testing an inefficient use of time and resources.16 borsci et al. addressed and tested the debate concerning the five-user assumption and its established adoption. they observed that, while it is widely accepted, human-computer interaction professionals are split into two camps: those who apply and accept the model and those who are critical of it but apply it nevertheless. they proposed an alternative model called the “grounded procedure,” but still recommended the five-user assumption as a starting point.17 additionally, the rgrt adopted the use of task scenarios for usability testing to gain valuable insights from users. rather than setting goals such as “find a website on citation,” it is more helpful to design a task that presents context. therefore, a task scenario lets the user know why they need to find something and not simply how. mccloskey recommended designing tasks based on what a user would realistically do when visiting the website. she recommended making the task “actionable” to set up the context of why a user would need to visit the site.18 the success of mccloskey’s recommended actionable tasks are observed in many other usability studies.19 redesign process to create the design, the rgrt began by independently investigating other universities ’ research guides homepage designs and reporting back to the group on their findings. the rgrt was looking for a model for improving categorization and guides, limiting the use of long lists, and foregrounding search. after investigating a few candidates, the group unanimously agreed that the university of arizona’s research guides homepage (see fig. 4) should be the model.20 the design was effective because the content was not excessive, scrolling was unnecessary, and headings were descriptive. the visual aspects were not purely decorative and helped the patron understand the purpose of the content. the rgrt was given approval from the university of arizona to model the new research guides homepage after theirs.21 in this model, a topic-based navigation design would organize sections into research help guides, subject guides, topic guides, and course guides. every guide would only be included in one of these four categories. research help guides covered information literacy and library use skills, subject guides covered major academic https://libguides.library.arizona.edu/library-guides information technology and libraries september 2023 redesigning research guides 7 mcclure, hess, and marsicano programs, and course guides were for specific classes. topic guides was a catch -all area for guides that did not easily fit in the other categories. for each category, a smaller list of the most highly used guides in that category, based on springshare user statistics and to be updated regularly, would be displayed on the homepage, followed by a link to a complete list of guides in the category. at the request of librarians at the university of memphis’s three branch libraries (health sciences, lambuth, and music), guides connected to these branches were also organized together. the bottom of the homepage was dedicated to an ask a librarian help box. figure 4. university of arizona research guides homepage. to create the redesigned subject guides, the rgrt identified a course guide created by the university of memphis’s instruction curriculum coordinator as the design standard. it included tutorials, videos, and other instructional materials guiding students on how to use the linked content.22 each guide would have a standardized set of content pages based on this model, including redirect pages for introduction to research and writing help guides. the rgrt created a list of subject guides and used it to identify redundant guides that could be consolidated into these new subject guides. each guide would be a broad subject area (e.g., business & economics, social sciences), and the main page would cover commonly used resources such as major databases for that area. more specialized resources would be included on topic pages for subareas reflecting specific academic programs (e.g., accountancy, psychology). in the spring of 2021, the rgrt completed the design for the homepage (see fig. 5). it was populated with the new subject guides (see fig. 6 for an example) that followed a standard format (see fig. 7 for the page-level organization). information technology and libraries september 2023 redesigning research guides 8 mcclure, hess, and marsicano figure 5. initial redesigned research guides homepage. figure 6. business & economics subject guide homepage. information technology and libraries september 2023 redesigning research guides 9 mcclure, hess, and marsicano figure 7. business & economics subject guide navigation. usability testing this study was exempt from irb (institutional review board) approval and qualitative data using task scenarios was gathered to gain insights on the new research guides homepage and its linked pages. task scenarios 1, 2, 3, and 5 were designed to reflect real user needs and required access to a variety of different guides, which would involve either using the guides homepage structure or using the libguides search. task scenario 4 was designed to see if users identified the ask a librarian section at the bottom. with a laptop and a series of task scenarios (see table 1), a usability testing center was set up at the university of memphis’s starbucks. for recruitment, two of the authors asked various students in line and socializing at tables to participate. a $5 starbucks gift card was awarded to participants. data from five recruited participants was gathered. before beginning the testing session, one of the authors acted as facilitator, explaining the testing was to ensure the website was intuitive and set up in a way that is helpful. she assured the participants that their knowledge was not being tested. instead, the goal was to make sure that the web page was user-friendly and accessible. the participants were encouraged to use a think-aloud method when completing tasks, so the author acting as note-taker could transcribe the thought process of each participant.23 information technology and libraries september 2023 redesigning research guides 10 mcclure, hess, and marsicano table 1. task scenarios task #1 you need to find information on mla citation. how would you use this page to find that information? task #2 you are in a course titled soci 4420: racial inequality. how would you use this page to find information on that course? task #3 how would you use this page to find historical newspapers to use in a research project? task #4 if you were struggling to access a library resource, how would you use this page to solve the problem? task #5 if you wanted a video explaining how to find articles to use in your research assignment, how would you find it? the same series of task scenarios was given to each participant to complete. the authors began each testing session on a laptop with the research guides homepage already open. the main goal of each testing session was to observe if each participant could successfully complete task scenarios using the homepage and its links to external and internal web pages. the secondary goal was for participants to complete the task scenario with minimal attempts and limited clicks. when a task was completed, the facilitator observed how many attempts were made and if the participant was successful or unsuccessful. a successful attempt was documented if the participant completed the task using the information on the homepage and its links. a successful attempt was also documented if the user did not complete a task but was able to use the page to demonstrate adequate information-seeking behavior (e.g., using chat or the ask a librarian box to seek answers). if a participant did not complete a task, while also not utilizing the homepage to seek answers, then the task was marked as unsuccessful. results out of the 25 total task scenarios conducted during usability testing, 20 were successfully completed and 5 failed (see table 2). a quick summary of each participant’s experiences with completing the task scenarios is provided below. more detailed tables and notes for each participant are included in appendix a. table 2. task scenarios completed or failed by participant task scenario 1 task scenario 2 task scenario 3 task scenario 4 task scenario 5 participant 1 completed completed completed failed completed participant 2 completed completed completed completed completed participant 3 completed completed completed completed completed participant 4 completed completed failed completed completed participant 5 failed failed completed failed completed information technology and libraries september 2023 redesigning research guides 11 mcclure, hess, and marsicano participant #1 successfully completed four out of five task scenarios. they used the topic-based navigation to complete tasks 1, 2, and 5, while they initially failed task scenario 3 using the topic based navigation before completing it using the search bar. they failed task scenario 4, failing to understand the “access a resource” language in the task scenario and stating that they would email someone for help. participant #2 completed all five of the task scenarios. they used topic-based navigation to complete tasks 1 and 2. for task scenario 3, the participant used the search bar. though there was again some confusion over the “access a resource” language used in task 4, the participant stated they would use chat or contact using the information in the ask a librarian box to complete the task. to complete task 5, the participant scanned the homepage and found the library help videos link. participant #3 successfully completed all five task scenarios but required multiple attempts to complete tasks 3 and 4. ultimately tasks 1, 2, 3, and 5 were completed using topic-based navigation. when unsuccessfully clicking various links in an attempt to complete task 4, the participant stated they would contact using the information in the ask a librarian box. participant #4 successfully completed four out of five task scenarios. tasks 1, 2, and 5 were completed using the topic-based navigation, while task scenario 4 was completed by scrolling to the ask a librarian section of the homepage. the participant tried to use the topic-based navigation for task 3, but they found a guide on contemporary, not historical, newspapers and failed the task. participant #5 completed two out of the five task scenarios. task 3 was completed using the search bar and finding an faq on historical newspapers, and task 5 was completed using the topic-based navigation. for task scenario 1, the participant left the research guides homepage using the logo in the banner at the top of the page, which took them to the university libraries website. they had to be returned to the guides homepage for the following tasks. the participant tried to use the topic-based navigation three times to complete task 2 before giving up, and they also tried topic-based navigation for task scenario 4 before giving up. discussion out of the 20 successfully completed task scenarios during the usability testing, 15 task scenarios were completed by users browsing the research guides homepage and making use of the headings and layout. these results demonstrate that the topic-based navigation design separating each section by research help, subject guides, topic guides, and course guides proved effective. users were regularly able to intuit the correct area to look under with minimal errors. scrolling was limited and clicks were few. the search bar was used less frequently, to the surprise of the rgrt. only three task scenarios were successfully completed by using the search. participants often clicked on random pages instead of using the search bar, and even when the search bar was used, it was not always immediately identified by the users. the authors noted that users would skip directly to the four category headings. the success of the ask a librarian section was mixed. three of the five users successfully completed task scenario 4. part of this was that some users were not familiar with the phrase “report access issues,” though once one of the authors explained further, many of them were still information technology and libraries september 2023 redesigning research guides 12 mcclure, hess, and marsicano unsure of who to ask or where to report issues. additionally, only two out of the five participants asked for help when they could not solve other task scenarios. the remaining three gave up or moved onto the next question regardless of the availability of chat or the ask a librarian box. limitations four out of five participants were undergraduate students. an ideal study would have had participants that included an additional graduate student and perhaps a faculty/staff member from the university of memphis. an additional limitation was that the disability status of the participants was unknown, so the accessibility of the design was not formally tested beyond running the design through a web accessibility checker. lastly, the original design was not tested for user errors and assumptions were made based on the rgrt’s observations and usage statistics in springshare. edits were made after usability testing occurred; therefore, improvements in usability between the original homepage design and the revision cannot be shown directly. post-test revisions since most of the task scenarios were completed with successful results, the new design for the research guides homepage proved usable overall. still, when the authors presented the findings to the rest of the rgrt at one of their regular meetings, the group discussed ideas for addressing issues identified in the testing. as a result, some minor changes were made: • the four categories (research help, topic guides, course guides, and subject guides) were moved to the top of the page with the search bar below it. • a graphic was created to draw attention to the search bar. • the music, health sciences, and lambuth campus guides were made more visible by giving each link a university of memphis brand-approved blue background with white lettering. the final product after usability testing was completed is shown in figure 8. the minor changes were not subject to usability testing at the time to test whether they improved the usability of the homepage. further usability testing on the homepage is in the planning stages as the authors write this article. information technology and libraries september 2023 redesigning research guides 13 mcclure, hess, and marsicano figure 8. post-testing redesigned research guides homepage. conclusion these results report one of the three usability testing results conducted on the research guides homepage. the rgrt elected to do three small tests with five users each. these tests were conducted in the same manner as the first, with the authors asking users to complete tasks in a starbucks on a laptop. the only difference was the task scenarios. the rgrt designed new task scenarios for each test to reflect changes made to the homepage based on previous user experiences. for example, there were questions designed to test the relocation of the search bar and the branch library (music, lambuth, and health sciences) headings. a list of task scenarios from the other two tests are included in appendix b and appendix c. this study proves it is imperative that library web page designers familiarize themselves with the users’ information-seeking behavior in their community to design pages to be intuitive and easy to use. while perhaps the most valuable data gleaned from this study was ensuring an effective library web page design, the rgrt also learned that streamlining a workflow and making information technology and libraries september 2023 redesigning research guides 14 mcclure, hess, and marsicano maintenance of research guides a priority is essential for the success of enhancing an academic library’s online presence. furthermore, it is a goal of the rgrt to conduct usability testing of its websites on a quarterly basis, not just on the homepage, but many of its subpages and the design of the guides themselves. the value of usability testing to the university library is exemplified in this study. now, anytime a crucial change is made to research guides, the rgrt is equipped to conduct usability testing to verify modifications and introduce new designs. information technology and libraries september 2023 redesigning research guides 15 mcclure, hess, and marsicano appendix a. summary of partipant tests participant #1 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources guide → mla subsection. completed. the participant quickly found it without hesitation. participant stated when talking aloud that they did not understand what “mla” or “citation” meant. but the participant was able to find the information by scanning for the words provided. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed. the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: open educational resources guide. failed. attempt 2: used the search bar → historical newspapers guide. completed. the participant paused to work through this task. the participant did not notice the search bar during the first attempt. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: participant did not know where to click. failed. participant did not appear to understand phrase “access a resource.” participant stated, “i don’t know … i would just email someone.” task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. the participant found the link to report an access issue while performing this task. participant #2 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed.  the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: used the search bar → historical newspapers guide. completed. the participant took more time with this task. ultimately, they relied on the search bar to complete the task. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: chat and ask a librarian. completed. the participant gave up and said they would rely on chat and the ask a librarian function on the web page to report an access issue. this attempt was documented as successful since they used the page to exhibit sufficient information-seeking behavior. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. the participant quickly found it without hesitation. information technology and libraries september 2023 redesigning research guides 16 mcclure, hess, and marsicano participant #3 (graduate student and staff member) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed.   the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: library help videos. failed. attempt 2: all research help guides → historical newspapers guide. completed. when not finding what they were looking for, they switched tactics and selected the all research help guides link. from there, they found the historical newspapers guide from an alphabetical list. the only user who completed this task using the navigation menu headings on the homepage. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: could not find it by clicking various links. failed. attempt 2: ask a librarian. completed. the student clicked on various links on the homepage looking for the phrase “access a resource” (it was impossible for the note-taker to keep up with the various links the participant tried). eventually the student stated they would rely on the ask a librarian box task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: online tutorials and help guides → library help videos → articles subsection. completed. “i would use the libraries search bar to look for articles. i wouldn’t need a video.” participant #4 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: citation resources → mla subsection. completed. the participant quickly found it without hesitation. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: all course guides → soci 4420: racial inequality guide. completed. the participant quickly found it without hesitation. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: all topic guides → news literacy & news sources guide. failed. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: scrolled to the ask a librarian box → report access issue. completed. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → journals subsection. completed. information technology and libraries september 2023 redesigning research guides 17 mcclure, hess, and marsicano participant #5 (undergraduate student) task scenario 1: you need to find information on mla citation. where would you go? attempt 1: university libraries homepage → searched kinesiology → selected an ebook on kinesiology. failed. immediately left the research guides homepage by selecting banner at the top. the participant did not understand the question. task scenario 2: if you were in the class soci 4420: racial inequality, where would you find information on that course? attempt 1: library help videos. failed. attempt 2: writing help guide. failed. attempt 3: humanities guide → courses and topics subsection → books and ebooks. failed. the participant gave up and wanted to move to the next question. task scenario 3: where would you go to find historical newspaper articles to use in a research paper? attempt 1: used the search bar → faq: can i access historical issues of the new york times? → proquest historical newspapers: the new york times with index. completed. the participant did not notice the links to the historical newspapers guide once using the search bar. task scenario 4: if you were struggling to access a resource for the library, where would you go? attempt 1: open educational resources guide. failed. the participant gave up quickly and appeared exasperated. task scenario 5: if you needed to find articles to use in your research paper, where would you go to find a video explaining how to do that? attempt 1: library help videos → articles subsection. completed. information technology and libraries september 2023 redesigning research guides 18 mcclure, hess, and marsicano appendix b. task scenarios for second usability test task #1 you are enrolled in a course where you are tasked with writing an informative paper on the musical instrument of your choice; how would you use this page to locate that information? task #2 you have heard about a service where you can borrow books and articles from other libraries. you remember it’s called interlibrary loan. how would you use this page to use that service? task #3 where would you go to find historical newspapers? task #4 where would you go to find support for lgbtqia+ students? task #5 in which category (research help, subject guides, topic guides, course guides) would you look for: a) a guide to the natural and physical sciences? (subject) b) a guide on news literacy and news sources? (topic) information technology and libraries september 2023 redesigning research guides 19 mcclure, hess, and marsicano appendix c. task scenarios for third usability test task #1 where would you look to find information on the history of the library’s lambuth branch? task #2 what would you search through information about the health sciences library? task #3 how would you use this page to find primary sources? task #4 you are in a class where the instructors ask you to cite in apa format. you are unfamiliar with this citation style and seek assistance from the library. how would you use this page to find information on apa citation? task #5 you are in a criminology course and need to find crime data to include in a project. how would you use this page to find that information? endnotes 1 nigel ford, “what is information behaviour and why do we need to know about it?,” in introduction to information behaviour (london: facet publishing, 2015), 14. 2 sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college & research libraries 78, no. 6 (2017): 846, https://doi.org/10.5860/crl.78.6.844. 3 peggy c. holzweiss, barbara polnick, and fred c. lunenberg, “online in half the time: a case study with online compressed courses,” innovative higher education, 44 no. 4 (2019): 299– 315, https://doi.org/10.1007/s10755-019-09476-8; andrew j. magda, david capranos, and carol b. aslanian, online college students 2020: comprehensive data on demands and preferences (louisville, ky: wiley education), https://universityservices.wiley.com/wpcontent/uploads/2020/06/ocs2020report-online-final.pdf; julia e. seaman, i. elaine allen, and jess seaman, grade increase: tracking distance education in the united states (babson park, ma: babson survey research group, 2018), https://eric.ed.gov/?id=ed580852. 4 primary research group, the survey of library services for moocs: blended and distance learning programs, 2016 ed. (new york: primary research group inc, 2015), 24, https://digital.auraria.edu/files/pdf?fileid=149c1b4a-a35d-48f1-81d8-5f47bb757ecf 5 ruth sara connell, lisa c. wallis, and david comeaux, “the impact of covid-19 on the use of academic library resources,” information technology & libraries 40, no. 2 (2021): 8, https://doi.org/10.6017/ital.v40i2.12629. 6 oliver schulz, “library support for distance learning at colorado christian university,” theological librarianship 13, no. 2 (2020): 20–22, https://doi.org/10.31046/tl.v13i2.1938. 7 jimmy ghaphery and erin white, “library use of web-based research guides,” information technology & libraries 31, no. 1 (2012): 23, https://doi.org/10.6017/ital.v31i1.1830. 8 amy e. g. barker and ashley t. hoffman, “student-centered design: creating libguides students can actually use,” college and research libraries 82, no. 4 (2021): 75–91, https://doi.org/10.5860/crl.82.1.75; danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1007/s10755-019-09476-8 https://universityservices.wiley.com/wp-content/uploads/2020/06/ocs2020report-online-final.pdf https://universityservices.wiley.com/wp-content/uploads/2020/06/ocs2020report-online-final.pdf https://eric.ed.gov/?id=ed580852 https://digital.auraria.edu/files/pdf?fileid=149c1b4a-a35d-48f1-81d8-5f47bb757ecf https://doi.org/10.6017/ital.v40i2.12629 https://doi.org/10.31046/tl.v13i2.1938 https://doi.org/10.6017/ital.v31i1.1830 https://doi.org/10.5860/crl.82.1.75 information technology and libraries september 2023 redesigning research guides 20 mcclure, hess, and marsicano testing,” information technology & libraries 32, no. 1 (2013): 6–22, https://doi.org/10.6017/ital.v32i1.2311; suzanna conrad and christy sevens, “am i on the library website?: a libguides usability study,” information technology & libraries 38, no. 3 (2019): 49–81, https://doi.org/10.6017/ital.v38i3.10977; amanda donovan, “building the website of the future in-house,” computers in libraries 42, no. 2 (2022): 4–8, https://www.infotoday.com/cilmag/mar22/index.shtml. 9 isabel vargas ochoa, “navigation design and library terminology: findings from a usercentered usability study on a library website,” information technology & libraries 39, no. 4 (2020): 12, https://doi.org/10.6017/ital.v39i4.12123. 10 mark aaron polger, “student preferences in library website vocabulary,” library philosophy & practice (2011): 1, https://digitalcommons.unl.edu/libphilprac/618/. 11 jon c. giullian and ernest a. zitser, “beyond libguides: the past, present, and future of online research guides,” slavic and east european information resources 16, no. 4 (2015): 170–80, https://doi.org/10.1080/15228886.2015.1094718. 12 ashley lierman, bethany scott, mea warren, and cherie turner, “testing for transition: evaluating the usability of research guides around a platform migration,” information technology and libraries 38, no. 4 (2019): 83, https://doi.org/10.6017/ital.v38i4.11169. 13 yoo young lee and m. sara lowe, “building positive learning experiences through pedagogical research guide design,” journal of web librarianship 12, no. 4 (2018): 205–31, https://doi.org/10.1080/19322909.2018.1499453. 14 kimberly l. o’neill and brooke a. guilfoyle, “sign, sign, everywhere a sign: what does ‘reference’ mean to academic library users?,” the journal of academic librarianship 41, no. 4 (2015): 386–93, https://doi.org/10.1016/j.acalib.2015.05.007. 15 aaron bowen, jake ellis, and barbara chaparro, “long nav or short nav?: student responses to two different navigational interface designs in libguides version 2,” the journal of academic librarianship 44, no. 3 (2018): 391–403, https://doi.org/10.1016/j.acalib.2018.03.002. 16 jakob nielsen, “why you only need to test with 5 users,” accessed may 22, 2023, https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ 17 simone borsci et al., “reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation,” acm transactions on computer-human interaction (tochi) 20, no. 5 (2013): 11–18, https://doi.org/10.1145/2506210. 18 marieke mccloskey, “turn user goals into task scenarios for usability testing,” accessed may 22, 2023, https://www.nngroup.com/articles/task-scenarios-usability-testing/. 19 olga torres-hostench, joss moorkens, sharon o’brien, and joris vreeke, “testing interaction with a mobile mt post-editing app,” translation & interpreting 9, no. 2 (2017): 138–50, https://doi.org/10.12807/ti.109202.2017.a09; jure trilar, tjaša sobočan, and emilija stojmenova duh, “family-center design: interactive performance testing and user interface evaluation of the slovenian edavki public tax portal,” sensors 21, no. 15 (2021): 5161, https://doi.org/10.3390/s21155161; scott uhl, “applying user-centered design to discovery https://doi.org/10.6017/ital.v32i1.2311 https://doi.org/10.6017/ital.v38i3.10977 https://www.infotoday.com/cilmag/mar22/index.shtml https://doi.org/10.6017/ital.v39i4.12123 https://digitalcommons.unl.edu/libphilprac/618/ https://doi.org/10.1080/15228886.2015.1094718 https://doi.org/10.6017/ital.v38i4.11169 https://doi.org/10.1080/19322909.2018.1499453 https://doi.org/10.1016/j.acalib.2015.05.007 https://doi.org/10.1016/j.acalib.2018.03.002 https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ https://doi.org/10.1145/2506210 https://www.nngroup.com/articles/task-scenarios-usability-testing/ https://doi.org/10.12807/ti.109202.2017.a09 https://doi.org/10.3390/s21155161 information technology and libraries september 2023 redesigning research guides 21 mcclure, hess, and marsicano layer evaluation in the law library,” legal reference services quarterly 38, no. 1/2 (2019): 30–63, https://doi.org/10.1080/0270319x.2019.1614373. 20 jeremiah paschke-wood, ellen dubinsky, and leslie sult, “creating a student-centered alternative to research guides: developing the infrastructure to support novice learners,” in the library with the lead pipe (october 2020), https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-researchguides/. 21 university libraries, “research: by course, subject, or topic,” accessed may 22, 2023, https://libguides.library.arizona.edu/library-guides. 22 university libraries, “engl 1020: english composition 1020,” accessed may 22, 2023, https://libguides.memphis.edu/engl1020. 23 jakob nielsen, “thinking aloud: the #1 usability tool,” accessed may 22, 2023, https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/. https://doi.org/10.1080/0270319x.2019.1614373 https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-research-guides/ https://www.inthelibrarywiththeleadpipe.org/2020/student-centered-alternative-research-guides/ https://libguides.library.arizona.edu/library-guides https://libguides.memphis.edu/engl1020 https://www.nngroup.com/articles/thinking-aloud-the-1-usability-tool/ abstract introduction literature review redesign process usability testing results discussion limitations post-test revisions conclusion appendix a. summary of partipant tests participant #1 (undergraduate student) participant #2 (undergraduate student) participant #3 (graduate student and staff member) participant #4 (undergraduate student) participant #5 (undergraduate student) appendix b. task scenarios for second usability test appendix c. task scenarios for third usability test endnotes mapping for the masses: gis lite and online mapping tools in academic libraries kathleen w. weessies and daniel s. dotson information technology and libraries | march 2013 23 abstract customized maps depicting complex social data are much more prevalent today than in the past. not only in formal published outlets, interactive mapping tools make it easy to create and publish custom maps in both formal and more casual outlets such as social media. this article defines gis lite, describes three commercial products currently licensed by institutions, and discusses issues that arise from their varied functionality and license restrictions. introduction news outlets from newspapers to television to internet these days are filled with maps that make it possible for readers to visualize complex social data. presidential election results, employment rates, and the plethora of data arising from the census of population are just a small sampling of social data mapped and consumed daily. the sharp rise in published maps in recent years has increased consumer awareness of the effectiveness of presenting data in map format and has raised expectations for finding, making and using customized maps. not just in news media, but in academia also, researchers and students have high interest in being able to make and use maps in their work. just a few years ago even the simplest maps had to be custom made by specialists. researchers and publishers had to seek out highly trained experts to make maps on their behalf. as a result, custom maps were generally only to be found in formal publications. the situation has changed partly because geographic information system (gis) software for geographic analysis and map making is more readily available than in years past. it does, however, remain specialized and wants considerable training for users to be proficient at even a basic level.1 this gap between supply and demand has been partly filled, especially in the last five years, by the growth of internet-based “gis lite” tools. while some basic tools are freely available on the internet, several tools are subscription-based and are licensed by libraries, schools and businesses for use. college and university libraries especially are quickly becoming a major resource for data visualization and mapping tools. the aim of this article is to describe several data-rich gis lite tools available in the library market and how these products have met or failed to meet the needs of several real-life college class kathleen w. weessies (weessie2@msu.edu), a lita member, is geosciences librarian and head of the map library, michigan state university, lansing. michigan. daniel s. dotson (dotson.77@osu.edu) is mathematical sciences librarian and science education specialist, associate professor, ohio state university libraries, columbus, ohio. mailto:weessie2@msu.edu mailto:dotson.77@osu.edu mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 24 situations. this is followed by a discussion of issues arising from user needs and restrictions posed by licensing and copyright. what is gis lite? students and faculty across the academic spectrum often discover that their topic has a geographic element to it and a map would enhance their work (paper, presentation, project, poster, article, book, thesis or dissertation, etc.). if their research involves data analysis, geospatial tools will draw attention to spatial patterns in the data that might not otherwise be apparent. every scholar with such needs must make a cost/benefit decision concerning gis: is his or her need greater than the cost in time and effort (sometimes money) necessary to learn or hire skills to produce map products? a full functioning gis, being a specialized system of software designed to work with geospatially referenced datasets, is designed to address all the problems above. the data may be analyzed and output into customized maps exactly to the researcher’s need. the traditional lowend solution available to non-experts, on the other hand, is colorizing a blank outline map, either with hand-held tools (markers, colored pencils, etc.) or on a computer using a graphic editing program. the profusion of web mapping options dangles tantalizingly with possibility, and occasionally (and increasingly) is able to provide an output that illustrates a useful point of users’ research in a professional enough manner to fill a need. in recent years the web has blossomed with map applications collectively called the “geoweb” or “geospatial web.” geoweb or geospatial web refers to the “emerging distributed global gis, which is a widespread distributed collaboration of knowledge and discovery.”2 some geoweb applications are well known street map resources such as google maps and mapquest. others are designed to deliver data from an organization, such as the national hazards support system (http://nhss.cr.usgs.gov), national pipeline mapping system (http://www.npms.phmsa.dot.gov/publicviewer), and the broadband map (http://www.broadbandmap.gov). a few tools focus on map creation and output such as arcgis online (http://www.arcgis.com/home/webmap/viewer.html) and scribble maps (http://www.scribblemaps.com). the newest subgenre of the geoweb consists of participatory mapping sites such as openstreet map (http://www.openstreetmap.org), did you feel it? (http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi), and ushahidi (http://community.ushahidi.com/deployments). the geoweb literature is small but growing. 3 elwood reviewed published research on the geographic web.4 the geoweb literature tends to focus on creation of mappable data and delivery of geoweb services.5 in these the map consumer only appears as a contributor of data. very little has been written about users’ needs from the geoweb. the term gis lite has arisen among map and gis librarians to describe a subset of geoweb applications. gis lite is useful to library patrons lacking specialized gis training but who wish to conduct some gis and map-making activities on a lower learning curve. for the purpose of this article, gis lite will refer to applications, usually web-based, which allow users to manipulate geospatial data and create map outputs without programming skills or training in full gis software. http://nhss.cr.usgs.gov/ http://www.npms.phmsa.dot.gov/publicviewer http://www.broadbandmap.gov/ http://www.arcgis.com/home/webmap/viewer.html http://www.scribblemaps.com/ http://www.openstreetmap.org/ http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi http://community.ushahidi.com/deployments information technology and libraries | march 2013 25 while many geoweb applications allow only low-level output options, gis lite will provide an output intended to be used in activities or rolled into a gis for further geospatial processing. in libraries, gis lite is closely allied with data and statistics resources. data and statistics librarianship have already been discussed as disciplines in the literature such as by hogenboom6 and gray.7 new technologies and access to deeper data resources such as the ones presented here have raised the bar for librarians’ responsibilities for curating, serving, and aiding patrons in its use. rather than be passive shepherds of information resources, librarians are now active participants and even information partners. librarians with map and gis skills similarly can directly enhance the quality of student scholarship across academic disciplines.8 the gis lite resources, however, need not remain specialized tools of map and gis librarians. librarians working in disciplines across the academic spectrum may incorporate them into their arsenal of tools to meet patron needs. data visualization tools a growing number of academic libraries have licensed access to online data providers. the following data tools contain enough gis lite functionality to aid patrons in visualizing and manipulating data (primarily social data) and creating customized map outputs. three of the more powerful commercial products described here are social explorer, simplymap, and proquest statistical datasets. social explorer licensed by oxford university press, social explorer provides selected data from the us decennial census 1790 to 2010, plus american community survey 2006 through 2010.9 the interface enables either retrieval of tabular data or visualization of data in an interactive map. as the user selects options through pull-down menus, the map automatically refreshes to reflect the chosen year and population statistics. the level of geography depicted defaults to county level data. if a user zooms in to an area smaller than a county, then data refreshes to smaller geographies such as census tracts if they are available at that level for that year. output is in the form of graphic files suitable for sharing in a computer presentation (see figure 1). one advantage of social explorer is that it utilizes historic boundaries as they existed for states, territories, counties, and census tracts for each given year. social explorer utilizes data and boundary files generated by the national historical gis (nhgis) based at the university of minnesota in collaboration with other partners. the creation of these historical boundaries was a significant undertaking and accomplishment.10 custom tables of data and the historic geographic boundaries may also be retrieved and downloaded for use from an affiliated engine through the nhgis website (http://www.nhgis.org). a disadvantage of this product is that the tool, while robust, does not completely replicate all the data available in the original paper census volumes. also, historical boundaries have not been created for city or township-level data. the final map layout is not customizable either in the location of title and legend or in the data intervals. http://www.nhgis.org/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 26 figure 1: map depicting population having four or more years of college, 1960 (source: social explorer, 2012; image used with permission) simplymap simplymap (http://geographicresearch.com/simplymap) is a product of geographic research. this powerful interface brings together public and licensed proprietary data to offer a broad array of 75,000 data variables in the united states. us census data are available 1980–2010 normalized to the user’s choice of either year 2000 or year 2010 geographies. numerous other licensed datasets primarily focus on demographics and consumer behavior, which makes it popular as a marketing research tool. each user establishes a personal login which allows created maps and tables to persist from session to session. upon creating a map view, the user may adjust the smaller geographic unit at which the theme data is displayed and also may adjust the data intervals as desired. the user creates a layout, adjusting the location of the map legend and title before exporting as a graphic or pdf (see figure 2). data are also exportable as gis-friendly shapefiles. http://geographicresearch.com/simplymap information technology and libraries | march 2013 27 the great advantage of this product is the ability to customize the data intervals. this makes it possible to filter the data and display specific thresholds meaningful to the user. for instance if a user needs to illustrate places where an activity or characteristic is shared by “over half” of the population, then one may change the map to display two data categories: one for places where up to 50 percent of the population shares the characteristic and a second category for places where more than 50 percent of the population shares the characteristic. another potential advantage is that all local data have been allocated pro rata so that all variables, regardless of their original granularity, may be expressed by county boundaries, by zip code boundaries, or by census tract. a disadvantage of the product is the lack of historical boundaries to match historical data. figure 2. map depicting census tracts that have more than 50% black population (yellow line indicates cincinnati city boundary) (source: simplymap, 2012; image used with permission) mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 28 proquest statistical datasets statistical datasets was developed by conquest systems inc. and is licensed by proquest. this product also mingles a broad array of several thousand public and licensed proprietary datasets, including some international data, in one interface. the user may retrieve data and view it in tabular or chart form. if the data have a geographic element, then the user may switch the view to a map interface. the resulting map may be exported as an image. the data may also be exported to a gis-friendly shapefile format. this product offers more robust data manipulation than the other products, in that the user may perform calculations between any of the data tables and create a chart or map of the created data element (see figure 3). statistical datasets, however, has more simplistic map layout capabilities than the other products. figure 3. map of sorghum production, by country, in 2010 (source: proquest statistical datasets, 2012; image used with permission) case studies the following three case studies are of college classroom situations in which students utilized maps or map making as part of the assigned course work. the above mapping options are assessed for how well they met the assignment needs. information technology and libraries | march 2013 29 case study 1 an upper level statistics course at the ohio state university requires students to create maps using sas (http://www.sas.com). while many may not associate the veteran statistical software package with creating maps, this course uses it along with sas/graph to combine statistical data with a map. the project requires data articulated at the county level in ohio, which the students then combine into multi-county regions. the end result is a map with regions labeled and rendered in 3d according to the data values. an example of the type of map that could be produced from such data using sas can be seen in figure 4. figure 4. map of observed rabbit density in ohio using sas, sas/graph, and mail carrier survey data,1998 (image used with permission) while the data are provided in this course, students could potentially seek help from the library in a traditional way to find numerical data expressed at a county level. the librarian would guide http://www.sas.com/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 30 patrons through appropriate avenues to locate data such as to the three products listed above. all three options contain numerous data variables for ohio at the county level. because the students are further processing the data elsewhere (in this case sas), the output options of the three products are less important. ultimately the availability of data on a desired subject would be the primary determinant for choosing one of the three gis lite options discussed here. social explorer will export the data in tabular form which can then be ingested into sas. simplymap and proquest statistical datasets would both be a bit easier, though, because both packages allow the user to export the data as shapefiles which are directly imported into sas/graph as both boundary files and joined tabular data. case study 2 a first year writing class at michigan state university has a theme of the american ethnic and racial experience. assignments all relate to a student’s chosen ethnic group and geographic location from approximately 1880 to 1930. assignments build upon each other to culminate in a final semester paper. students with ancestors living in the united states at that time are encouraged to examine their own family’s ethnicity and how they fit in their geographic context. otherwise, students may choose any ethnic group and place of interest. maps are a required element in the assignments. maps that display historical census data help students place the subject ethnic group into the larger county, state, and national context over the time frame. the students can see, for instance, if their subject household was part of an ethnic cluster or an outlier to ethnic clusters. the parameters for finding data and maps are generous and open to each student’s interpretation. the wish is for students to find social statistics and maps that are insightful to their topic and will help them tell their story. of the three statistical resources considered above, currently the only useful one is social explorer because it covers the time period studied by the class. the students may map several social indicators at the county level across several decades and compare their local area to the region and the nation. also they may save their maps and include them in their papers (properly credited). case study 3 “the ghetto” is an elective geography class restricted to upperclassman at michigan state university. in the semester project, students analyze the spatial organization and demographic variables of “ghetto” neighborhoods in a chosen city. a ghetto is defined as neighborhoods that have a 50 percent or higher concentration of a definable ethnic group. since black and white are the only two races consistently reported at the census tract level for all the years covered by the class (1960 through 2010) the students necessarily use that data for their projects. data needs for the class are focused and deep. the students specifically need to visualize us census data from 1960 through 2010 at the census tract level within the city limits for several social indicators. indicators include median income, median housing value, median rent, educational attainment, income, and rate of unemployment. the instructor has traditionally required use of the paper census volumes and students created hand-made maps that highlight information technology and libraries | march 2013 31 tracts in the subject city that conformed to the ghetto definition and those that did not for each of the census years covered. computer-retrieved data and computer-generated maps would be acceptable, but at the time of this writing no gis lite product is able to make all the maps that meet the specific requirements of this class. social explorer covers all of the date range and provides data down to the tract level. however it does not provide an outline of the city limits and does not provide all the data variables required in the assignment. simplymap will only work for 2000 through 2010 because tract boundaries are only available for those two years even though the data go back to 1980. simplymap does provide two excellent features though: it is the only product that allows an overlay of the (modern) city boundary on top of the census tract map, ands it is the only product that allows manipulation of the data intervals. students may choose to break the data at the needed 50 percent mark, while the other products utilize fixed data intervals not useful to this class. proquest statistical datasets can compute the data into two categories to create the necessary data intervals; however census data are only available beginning with census 2000. map products for user needs these three real-life class scenarios illustrate how the rich and seemingly duplicative resources of the library can range from perfectly suitable to perfectly useless depending on each project’s exact needs. the appropriateness of any given tool can only be assessed fairly if the librarian is familiar with all the “ins and outs” of every product. the geoweb and gis lite tools mentioned throughout this article are summarized in table 1. the suitability of gis lite tools will be further affected by the following issues. historical boundaries the range and granularity of data tools are subject to factors sometimes at odds with what a researcher would wish to have. at this time, for instance, many historical resources provide data only as detailed as the county level. county level data are available largely due to the efforts of the nhgis mentioned above and the newberry library’s atlas of county boundaries project (http://publications.newberry.ort/ahcbp). far fewer resources provide historical data at smaller geographies such as city, township, or census tract levels. this is because the smaller the geographies get, the exponentially more there are to create and for map interfaces to process. from the well-known resource city and county data book,11 it is easy enough to retrieve us city data. the historical boundaries of every city in the united states, however, have not been created. this is because city boundaries are much more dynamic than county boundaries and there is no centralized authoritative source for their changes over time. two of the three case studies presented here utilized historic data. this isn’t necessarily a representative proportion of user needs; librarians should assess data resources in light of their own patrons’ needs. normalization two equally valid data needs concerning any kind of time series data concern changing geographic boundaries. census tracts, for instance, provide geographic detail roughly at the neighborhood level designed by the bureau of census to encompass approximately 2,500 to 8,000 http://publications.newberry.ort/ahcbp mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 32 people.12 because people move around and the density of population changes from decade to decade, so the configuration and numbering of tracts change over time. some scholars will wish to see the data values in the tracts as they were drawn at the time of issue. in this situation, a neighborhood of interest might belong to different tracts over the years or even be split between two or more tracts. other scholars focused on a particular neighborhood may wish to see many decades of census data re-cast into stable tracts in order to be directly comparable. data providers will take one approach or the other on this issue, and librarians will do well to be aware of their choice. license restrictions a third issue affecting use of these products is the ability to use derived map images, not only in formal outlets such as professional presentations, articles, books, and dissertations, but also informal outlets such as blogs and tweets. for the most part gis lite vendors are willing—even pleased—to see their products promoted in the literature and in social media. the vendors uniformly wish any such use to be properly credited. the license that every institution signs when acquiring these products will specify allowed and disallowed activities. the license, fixated on disallowing abuse or resale or other commercialization of the data, might leave a chilling effect on users wishing to use the images in their work. if a user is in any doubt as to the suitability of an intended use of a map, he or she should be encouraged to contact the vendor to seek permission for its use. as data resources grow and become more readily usable, the possibility for scholarly inquiry grows. librarians with familiarity with gis lite tools may partner with their patrons and guide them to the best resources. information technology and libraries | march 2013 33 table 1: a selection of geoweb and gis lite tools and their output options tool name url free or fee electronic output options* geoweb tools atlas of historical county boundaries http://publications.newberry.org/ahcbp/ free spatial data as shapefile, kmz; image as pdf did you feel it? http://earthquake.usgs.gov/earthquakes/dyfi/ free tabular data as txt, xml. image as jpg, pdf, ps google maps https://maps.google.com/ free none mapquest http://www.mapquest.com free none national broadband map http://www.broadbandmap.gov/ free image as png national hazards support systems (usgs) http://nhss.cr.usgs.gov/ free image as pdf, png national pipeline mapping system https://www.npms.phmsa.dot.gov/publicview er/ free image as jsf openstreetmap http://www.openstreetmap.org/ free tabular data as xml; image as png, jpg, svg, pdf ushahidi community deployments http://community.ushahidi.com/deployments/ free image as jpg gis lite tools arcgis online http://www.arcgis.com limited free options; access is part of institutional site license spatial data as arcgis 10; image as png (in arcexplorer) proquest statistical datasets http://cisupa.proquest.com/ws_display.asp?filt er=statistical%20datasets%20overview fee tabular data as excel, pdf, delimited text, sas, xml; spatial data as shapefile; image may be copied to clipboard sas/graph http://www.sas.com/technologies/bi/query_re porting/graph/index.html fee image as pdf, png, ps, emf, pcl scribble maps http://www.scribblemaps.com/ free spatial data as kml, gpx; image as jpg simplymap http://geographicresearch.com/simplymap fee tabular data as excel, csv, dbf, spatial data as shapefile; image as pdf, gif * does not include taking a screen shot of the monitor or making a durable url to the page http://publications.newberry.org/ahcbp/ http://earthquake.usgs.gov/earthquakes/dyfi/ https://maps.google.com/ http://www.mapquest.com/ http://www.broadbandmap.gov/ http://nhss.cr.usgs.gov/ https://www.npms.phmsa.dot.gov/publicviewer/ https://www.npms.phmsa.dot.gov/publicviewer/ http://www.openstreetmap.org/ http://community.ushahidi.com/deployments/ http://www.arcgis.com/ http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.scribblemaps.com/ http://geographicresearch.com/simplymap information technology and libraries | march 2013 34 references 1. national research council, division on earth and life studies, board on earth sciences and resources, geographical sciences committee, learning to think spatially (washington, d.c.: f academies press, 2006): 9. 2. pinde fu and jiulin sun, web gis: principles and applications (redlands, ca: esri press, 2011): 15. 3. for good overviews of the geoweb, see muki haklay, alex singleton and chris parker, “web mapping 2.0: the neogeography of the geoweb,” geography compass 2, no. 6 (2008): 20112039, http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x; jeremy w crampton, “cartography: maps 2.0,” progress in human geography 33, no. 1 (2009): 91-100, http://dx.doi.org/10.1177/0309132508094074. 4. sarah elwood, “geographic information science: visualization, visual methods, and the geoweb,” progress in human geography 35, no. 3 (2010): 401-408, http://dx.doi.org/10.1177/0309132510374250. 5. songnian li; suzana dragićević, and bert veenendaal eds, advances in web-based gis, mapping services and applications (boca raton, fl: crc press, 2011). 6. hogenboom, karen, carissa phillips, and merinda hensley, "show me the data! partnering with instructors to teach data literacy," in declaration of interdependence: the proceedings of the acrl 2011 conference, march 30-april 2, 2011, philadelphia, pa, ed. dawn m. mueller. (chicago: association of college and research libraries, 2011), 410-417, http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_ me_the_data.pdf. 7. ann s. gray, “data and statistical literacy for librarians,” iassist quarterly 28 no. 2/3 (2004): 24-29, http://www.iassistdata.org/content/data-and-statistical-literacy-librarians. 8. kathy weimer, paige andrew, and tracey hughes, map, gis and cataloging / metadata librarian core competencies (chicago: american library association map and geography round table, 2008), http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf. 9. social explorer. http://www.socialexplorer.com/pub/home/home.aspx. 10. catherine fitch and steven ruggles, building the national historical geographic information system historical methods 36, no. 1 (2003): 41-50, http://dx.doi.org/10.1080/01615440309601214 . 11. u. s. bureau of census. county and city data book, http://www.census.gov/prod/www/abs/ccdb.html. http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x http://dx.doi.org/10.1177/0309132508094074 http://dx.doi.org/10.1177/0309132510374250 http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.iassistdata.org/content/data-and-statistical-literacy-librarians http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf http://www.socialexplorer.com/pub/home/home.aspx http://dx.doi.org/10.1080/01615440309601214 http://www.census.gov/prod/www/abs/ccdb.html information technology and libraries | march 2013 35 12. census tracts and block numbering areas. http://www.census.gov/geo/www/cen_tract.html. acknowledgments the authors wish to thank dr. michael fligner, dr. clarence hooker, and dr. joe darden for permission to use their courses as case studies. http://www.census.gov/geo/www/cen_tract.html resource discovery: comparative survey results on two catalog interfaces heather hessel and janet fransen resource discovery: comparative survey results | hessel and fransen 21 abstract like many libraries, the university of minnesota libraries-twin cities now offers a next-generation catalog alongside a traditional online public access catalog (opac). one year after the launch of its new platform as the default catalog, usage data for the opac remained relatively high, and anecdotal comments raised questions. in response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. both quantitative and qualitative data inform the analysis and conclusions. introduction the growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. these catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. traditional opacs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. in 2002, the university of minnesota-twin cities libraries migrated from the notis library system to the aleph500™ system and launched a new web interface based on the aleph online catalog, originally branded as mncat. in 2006, the libraries contracted with the ex libris group as one of three development partners in the creation of a new next-generation search environment called primo. during the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. participants in the usability studies generally characterized the primo interface as “clear” and “efficient.”1 a year later the university heather hessel (heatherhessel@yahoo.com) was interim director of enterprise technology and systems, janet fransen (fransen@umn.edu) is the librarian for aerospace engineering, electrical engineering, computer science, and history of science & technology, university of minnesota, minneapolis, mn. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu information technology and libraries | june 2012 22 libraries branded primo as mncat plus, rebranded the aleph opac as mncat classic, and introduced mncat plus to the twin cities user community as a beta service. in august 2008, mncat plus was configured as the default search for the twin cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the aleph opac. a new organizational body called the primo management group was created in december 2008 to coordinate support, feedback, and enhancements of the local primo installation. this committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. when the primo management group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer mncat classic. therefore, two surveys were developed in response to the group’s charge. these two surveys were identical in scope and questions, except that one survey referenced mncat classic and was targeted to mncat classic searchers (appendix a), while the other survey referenced mncat plus and was targeted to mncat plus searchers (appendix b). these surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. research questions in addition to evaluating user satisfaction and requesting user input, the primo management group also chose to question users about searching behaviors in order to set the direction of future interface work. questions directed toward searching behaviors were informed by the findings from a 2009 university of minnesota libraries report on making resources discoverable.2 the group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. the primo management group crafted the surveys to get answers to the following research questions:  how often do users view their searching activity as successful?  how often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  what search environments do users choose when looking for a book? a journal? anything relevant to a topic?  how interested are users in finding items that are not physically located at the university of minnesota?  are there other types of resources that users would find helpful to discover in a catalog search? resource discovery: comparative survey results | hessel and fransen 23 although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. as wakimoto states in “scope of the library catalog in times of transition,” on the one hand, we have ‘net-generation users who are accustomed to the simplicity of the google interface, are content to enter a string of keywords, and want only the results that are available online. on the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and library of congress classifications and take full advantage of advanced search functions. we need to accommodate both of these user groups effectively.3 the primo management group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., school of dentistry). in designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. for example, the primo management group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. the group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. research method the primo management group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on mncat plus, and the other on mncat classic. if desired, users could choose to complete a separate survey for each of the two systems. links were also provided from within the mncat plus and mncat classic environments, and these links directed users to the relevant version of the survey without the intermediary page. in addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. the survey period lasted from october 1 through november 25, 2009. at the time of the surveys, the university of minnesota libraries was running primo version 2 and aleph version 19. because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. in considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. findings and analysis information technology and libraries | june 2012 24 findings relevant to each research question are discussed here. six hundred twenty-nine surveys contained at least one response—476 for mncat plus and 153 for mncat classic. responses by demographics as shown in table 1, graduate students were the primary respondents for both mncat plus and mncat classic, followed by undergraduates and faculty members. library staff made up 13 percent of mncat classic respondents and 4 percent of mncat plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for mncat plus, twenty for mncat classic). library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “unspecified” category. mncat classic respondents frequency mncat plus respondents frequency graduate student 50 33% graduate student 176 37% undergraduate student 31 20% undergraduate student 110 23% library staff 20 13% faculty 40 8% faculty 21 14% staff (non-library) 28 6% staff (non-library) 10 7% library staff 21 4% community member 2 1% community member 11 2% (unspecified) 19 12% (unspecified) 90 19% total 153 100% total 476 100% table 1. respondents by user population a comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. differences were also apparent in the representation of all four categories of students within a particular college unit. at least two college units were underrepresented in the survey responses: resource discovery: comparative survey results | hessel and fransen 25 carlson school of management and the college of continuing education. one college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the mncat classic survey, and 47 percent of the mncat plus students indicated that they were housed in the college of liberal arts (cla), and yet cla students only represent 32 percent of the total number of students on campus. table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. twin cities overall percentage of students mncat classic student survey respondents +/mncat plus student survey respondents +/ carlson school of management 9% 0% -9% 2% -7% center for allied health 0% 2% +1% 1% 0% col of educ/human development 10% 9% -1% 14% +3% col of food, agr & nat res sci 5% 4% 0% 7% +2% coll of continuing education 8% 1% -7% 1% -7% college of biological sciences 4% 6% +2% 5% 0% college of design 3% 3% 0% 3% 0% college of liberal arts 32% 59% +27% 47% +15% college of pharmacy 1% 1% 0% 0% -1% college of veterinary medicine 1% 1% 0% 1% 0% graduate school 0% 0% 0% 0% 0% humphrey inst of publ affairs 1% 1% 0% 1% 0% institute of technology (now college of science & engineering) 14% 9% -5% 10% -4% law school 2% 1% -1% 1% 0% medical school 4% 2% -3% 5% 0% school of dentistry 1% 1% 0% 0% -1% school of nursing 1% 0% -1% 0% -1% school of public health 2% 1% -1% 3% +1% table 2. student responses by affiliation information technology and libraries | june 2012 26 faculty and staff together totaled only eighty-nine respondents on the mncat plus survey and fifty-one respondents on the mncat classic survey. in keeping with graduate and undergraduate student trends, the college of liberal arts (cla) was clearly over-represented in terms of faculty responses. the cla faculty group represents about 17 percent of the faculty at the university of minnesota. yet over half the faculty respondents on the mncat plus survey were from cla; over 80 percent of the mncat classic faculty respondents identified themselves as affiliated with cla. faculty groups that were underrepresented include the medical school and the institute of technology. perceptions of success a critical area of inquiry for the surveys was user satisfaction and perceptions of success: “do users perceive their searching activity as successful?” asked in both surveys, the question’s responses allowed the primo management group to compare respondents’ perceived success between the two interfaces. results show a marked difference: while 86 percent of the mncat classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the mncat plus respondents reported the same perception of success. respondents reported very similar rates of success regardless of school, type of affiliation, or student status. figure 1. perceptions of success: mncat plus and mncat classic these results should be interpreted cautiously. because mncat plus is the libraries’ default catalog interface, mncat classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the mncat classic interface. one cannot assume that mncat users in general also would have an 86 percent perception of success were they to use mncat classic; familiarity with the tool could play a part in mncat classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% rarely sometimes usually very often mncat classic mncat plus resource discovery: comparative survey results | hessel and fransen 27 another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in mncat classic. a user’s criteria for success differ when searching for a known item versus conducting a general topical search. it is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. some features of mncat classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in mncat plus, which defaults to relevance-ranked results. (primo version 3 has implemented new features to enhance known-item searching.) comments received from users suggest that several factors played a role. one mncat classic respondent praised the “precision of the search...not just lots of random hits” and noted that mncat classic supports a “[m]ore focused search since i usually already know the title or author.” in contrast, a mncat plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” this comment is consonant with the results from other usability testing done on next-generation catalogs. in "next generation catalogs: what do they do and why should we care?", emanuel describes observed differences between topical and known-item searching: “during the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 a common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. training and experience may also explain some of the differences in success. mncat plus also enables functionality associated with the functional requirements for bibliographic records (frbr), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. however, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in primo. one mncat plus user expressed dissatisfaction and added, “i'm not sure if it's my lack of training/practice or that the system is not user-friendly.” in focus group analyses conducted in 2008, oclc found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. end users may not understand how to best craft an appropriate search strategy for topic searches.”5 how often do users know the title of the item that they are looking for? users come to the library with different goals in mind. in “chang's browsing,” available in theories of information behavior, chang identified five general browsing themes,6 adapted to discovery by carter.7 for the purposes of the survey, the primo management group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. the primo management group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using mncat plus than they did with mncat classic. the group was interested in knowing how often users search for known items. to explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. the survey results shown in table 3 indicate that a significantly higher proportion of mncat classic respondents (30 percent plus 43 percent = 73 percent) than mncat plus respondents (24 information technology and libraries | june 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. it may be that users in search of known items have learned to go to mncat classic rather than mncat plus. rarely sometimes usually very often total i already know the title of the item i am looking for mncat classic 7% (11) 19% (29) 30% (46) 43% (66) 152 mncat plus 15% (69) 33% (151) 24% (111) 29% (132) 463 i am looking for any resource relevant to my topic mncat classic 14% (21) 32% (47) 20% (29) 34% (51) 148 mncat plus 14% (62) 29% (133) 29% (133) 28% (127) 455 table 3. responses to “i already know the title of the item i am looking for” when the primo management group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. in the mncat plus survey, only 34 percent of undergraduate mncat plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. graduate student respondents showed interest in both kinds of use. if successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. the analysis of usability testing done on other next generation catalogs described in “next generation catalogs: what do they do and why should we care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 results for all mncat classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. no significant differences were identified by discipline. resource discovery: comparative survey results | hessel and fransen 29 figure 2. searching for a known item vs. any relevant resource some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “i never want to search by topic. library catalogs are for looking up specific items.” however, with respect to discovering resources for a subject in general, both mncat classic and mncat plus respondents showed that they would also like to find items relevant to their topic (figure 2). there was no significant difference between mncat classic and mncat plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. perceptions of success by specific characteristics for mncat plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. one-third of the mncat plus respondents had never tried to find an item in a particular format. over 40 percent had never tried to find an item with a particular isbn/issn. interface features may be a factor here: isbn/issn searching is not a choice in the mncat plus drop down menu, so users may not know that they can do such a search. a higher percentage of mncat classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than mncat plus respondents. figure 3 shows results based on particular characteristics. information technology and libraries | june 2012 30 figure 3. perception of success by characteristic although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. as demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with mncat classic. figure 4. perception of success by characteristic: library staff resource discovery: comparative survey results | hessel and fransen 31 searching by location: local collections and other resources in a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. but even the largest institution cannot collect everything a researcher might need. the primo management group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. finding items among the many library locations was not a problem for either mncat plus or mncat classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using mncat. furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as worldcat; 73 percent of mncat plus respondents and 78 percent of mncat classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. preferred search environments three of the survey questions asked users about their preferred search environments for different searching needs:  when looking for a particular book  when looking for a particular journal article  when searching without a particular title in mind each survey presented respondents with a list of choices and space to specify other sources not listed. respondents were encouraged to mark as many sources as they regularly use. when searching for a specific book, users of the two catalog environments identified a number of other sources. the top five sources in each survey are listed in table 4. when i am looking for a specific book, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. mncat classic (116) 1. mncat plus (217) 2. worldcat (50) 2. google (165) 3. amazon (50) 3. mncat classic (163) 4. google (49) 4. amazon (160) 5. google books (31) 5. google books (108) table 4. search environment for books information technology and libraries | june 2012 32 qualitative comments indicated that users like being able to connect to amazon and google books in order to look at tables of contents and reviews. they also specifically mentioned barnes and noble, as well as other local libraries. these results show that mncat plus respondents were more likely to also use mncat classic than vice-versa. the data do not suggest why this would be the case, but familiarity with the older interface may play a role. mncat classic respondents were more likely than mncat plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). one mncat plus respondent commented “i didn't know i could still get to mncat classic.” when searching for a specific journal article, users of both systems chose “other databases (jstor, pubmed, etc.)” above all the other choices. even more respondents would likely have marked this choice if not for confusion over the term “other databases.” most of the comments mentioned specific databases, even when the respondent had not selected the “other databases” choice. one user commented, “most of these choices would be illogical. you don't list article indexes, that's where i go first.” table 5 lists the five responses marked most often for each survey. when i am looking for a specific journal article, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (92) 1. other databases (jstor, pubmed, etc.) (232) 2. mncat classic (53) 2. google scholar (131) 3. google scholar (40) 3. e-journals list (130) 4. e-journals list (34) 4. mncat plus (110) 5. google (29) 5. mncat plus article search (101) table 5. search environment for articles. qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. this raised some questions with regard to mncat plus, which includes a tab labeled “articles” for conducting federated article searches. however, mncat plus respondents noted that they used the plus “articles” search almost as much as they did mncat plus. other plus comments included: i tried to use this for journal articles but it only has some in the database i guess and when i did my search it only found books and no articles. i don't understand it. i tried this new one and it came up with wierd [sic] stuff in terms of articles. my professor said to give up and use the regular indexes because i wasn't getting what i needed to do the paper. it wasted my time. this desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as sam houston state university’s assessment of use of and satisfaction with the webfeat resource discovery: comparative survey results | hessel and fransen 33 federated search tool. that study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 the new search tools that contain preindexed articles, such as primo central, summon, worldcat local, and ebsco discovery service, may address the frustrations that more experienced searchers express regarding federated search technology. when researching a topic without a specific title in mind, “google” and “other databases” were nearly equal and ranked first for mncat plus respondents, while “other databases” ranked first for mncat classic respondents. table 6 lists the five responses marked most option for each survey. when i am researching a topic without a specific title in mind, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (84) 1. google (197) 2. mncat classic (76) 2. other databases (jstor, pubmed, etc.) (192) 3. google (63) 3. google scholar (155) 4. google scholar (47) 4. mncat plus (145) 5. worldcat (32) 5. mncat classic (101) table 6. search environment for topics significant differences based on school affiliation were evident in the area of preferred search environments for topical research. for example, institute of technology respondents reported using google much more often when researching without a specific title in mind than respondents in other areas. evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. however, these limited results show that health sciences respondents relied more on library databases than on google. respondents in the liberal arts relied more on mncat, in either version, than did respondents in the other fields. desired resource types one feature of the primo discovery interface is its ability to aggregate records from more than one source. university libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the mncat plus catalog has been considered many times since primo’s release. the primo management group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. the group also asked users to mark “online journal articles” if they would find article results helpful. the question did not specify whether journal articles would appear integrated with other search results in a mncat “books” search or information technology and libraries | june 2012 34 in a separate search such as that already provided through a metasearch on the mncat plus articles tab. the surveys asked users what kinds of resources would make mncat more useful. the results for both mncat plus and mncat classic were similar and response counts for both surveys were ordered as shown in table 7. respondents could mark more than one of the choices. i would find mncat more useful if it helped me find: mncat classic frequency mncat plus frequency online journal articles 65 255 u of m research materials (e.g., research reports, preprints) 34 149 online media (e.g., digital images, streaming audio/visual) 27 134 archival finding aids 27 90 table 7. desired resource types the primo management group noted that more mncat plus respondents chose “online journal articles” more frequently than the other categories even though the mncat plus interface includes an “articles” tab for federated searching. it is unclear whether the respondents were not seeing the “articles” tab in mncat plus because they would like to see search results integrated, or if they were using the “articles” tab and were not satisfied with the results. comments from respondents generally supported the inclusion of a wider range of resources in mncat. however, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. one user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” several users cited the varying quality of the material discovered through library sources. one user supported the inclusion of articles “if it included good articles and not the ones i got.” a mncat classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of mncat as it is: “i use the best sources depending on my needs.” another mncat classic user expressed doubt that coverage of all disciplines was feasible. in commenting on the content of mncat, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). one mncat plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. in general, the interest in university of minnesota research material was fairly high. however, faculty members ranked university of minnesota research materials last in terms of preference: only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. resource discovery: comparative survey results | hessel and fransen 35 conclusions the data from two surveys, conducted concurrently in 2009 on a traditional opac (mncat classic) and next-generation catalog (mncat plus), point to differences in the use and perceptions of both systems. there appeared to be fairly strong “brand loyalty” with mncat classic, given that this interface is no longer the default search for the libraries. surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. it is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. mncat classic respondents were more likely to use worldcat to find a specific book than mncat plus respondents. mncat plus respondents indicated a use of mncat classic, but not vice versa. both sets of surveys described use of amazon and google for discovery. mncat plus respondents reported lower rates of success at finding known items than mncat classic respondents. mncat classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the mncat plus respondents reported having a specific title in mind. the team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. further discussion of the results and suggested attributes was conducted with library staff members in open sessions. results also informed local work on improving discovery interfaces. the results suggested:  the environment should support multiple discovery tasks, including known-item searching and topical research.  support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  users want to discover materials that are not owned by the libraries, in addition to local holdings.  a discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as jstor and pubmed. while the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. the team also recognized that there were a number of questions yet unanswered. some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  to what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  what general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  how much of the current environment requires training and how much is truly intuitive to users? information technology and libraries | june 2012 36  how can the university libraries identify and serve users who did not complete the surveys?  how useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? since the surveys were conducted, the university libraries upgraded to primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. primo version 3 allows users to conduct a left-justified title search (“title begins with…”), as well as sort by fields such as title and author. once the new version has been in place long enough for users to develop some comfort with the interface, the primo management group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. acknowledgements we would like to thank the other members of the primo management group, who helped to design and implement the surveys, as well as analyze and communicate the results: chew chiat naun (chair), susan gangl, connie hendrick, lois hendrickson, kristen mastel, r. arvid nelsen, and jeff peterson. we also want to acknowledge the helpful feedback and guidance of the group’s sponsor, john butler. references 1 tamar sadeh, “user experience in the library: a case study.” new library world 109, no. 1/2 (2008): 7–24. 2 cody hanson et al., discoverability phase 1 final report (minneapolis: university of minnesota, 2009), http://purl.umn.edu/48258/ (accessed dec. 20, 2010). 3 jina choi wakimoto, “scope of the library catalog in times of transition.” cataloging & classification quarterly 47, no. 5 (2009): 409–26. 4 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 5 karen calhoun, diane cellentani, and oclc, online catalogs : what users and librarians want: an oclc report (dublin, ohio: oclc, 2009). 6 shan-ju chang, “chang's browsing,” in theories of information behavior, ed. karen e. fisher, sandra erdelez and lynne mckechnie, 69-74 (medford, n.j.: information today, 2005). 7 judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (december 2009): 161–63. 8 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 9 abe korah and erin dorris cassidy. “students and federated searching: a survey of use and satisfaction,” reference & user services quarterly 49, no. 4 (summer 2010): 325–32. https://purl.umn.edu/48258 resource discovery: comparative survey results | hessel and fransen 37 appendix a. mncat classic survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat classic for these different purposes. 1. when i visit mncat classic… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 38 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat classic and what new features (if any) you’d like to see. 5. when i use mncat classic very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat classic strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat classic an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 39 7. i would find mncat classic more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat classic. 9. what i like most about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat classic, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 40  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ resource discovery: comparative survey results | hessel and fransen 41 appendix b. mncat plus survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat plus for these different purposes. 1. when i visit mncat plus… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 42 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat plus and what new features (if any) you’d like to see. 5. when i use mncat plus very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat plus strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat plus an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 43 7. i would find mncat plus more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat plus. 9. what i like most about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat plus, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 44  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ investigations into library web-scale discovery services jason vaughan information technology and libraries | march 2012 32 abstract web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. this article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. this included the creation of the discovery task force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. the article next addresses the substantial research conducted with library vendors that have developed these services. such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. together, feedback gained from library staff, insights arrived at by the discovery task force, and information gathered from vendors collectively informed the recommendation of a service for the unlv libraries. introduction web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. while the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. this article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. the second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. in this sense, the second half is focused externally. given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. it’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. such content can include library ils records, digital collections, institutional repository content, and content from locally developed and hosted databases. such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. in addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. this latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. for purposes of this article, web-scale discovery services are flexible services which jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada, las vegas. investigations into library web-scale discovery services | vaughan 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). this approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). by default, an intuitive, simple, google-like search box is provided (along with advanced search capabilities for those wishing this approach). the interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ils systems. why web-scale discovery? as illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. as a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. the quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on networklevel electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * end users’ expectations of data quality arise largely from their experiences of how information is organized on popular web sites. . . 4 * * * [user] expectations are increasingly driven by their experiences with search engines like google and online bookstores like amazon. when end users conduct a search in a library information technology and libraries | march 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * users don’t understand the difference in scope between the catalog and a&i services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * it is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. our users expect simplicity and immediate reward and amazon, google, and itunes are the standards against which we are judged. our current systems pale beside them.7 * * * q: if you could provide one piece of advice to your library, what would it be? a: just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the internet.8 additional factors sell the idea of web-scale discovery. obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. libraries often spend large sums of money to license and purchase content, sums that often increase annually. any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. at time of writing, early research is beginning to indicate that these tools can increase discovery. doug way compared link-resolver-database and full-text statistics prior to and after grand valley state university’s implementation of the summon webscale discovery service.9 his research suggests that the service was both broadly adopted by the university’s community and that it has led to an increase in their library’s electronic resource discovery and use. willamette university implemented worldcat local, and bill kelm presented results that showed an increase in both ill requests as well as use of the library’s electronic resources.10 from another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task investigations into library web-scale discovery services | vaughan 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the university of nevada las vegas (unlv) libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at unlv, but throughout the academic library community. coinciding with this, the libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. in spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. this event, open to unlv libraries staff and other nevada colleagues, was titled the discovery mini-summit, and more than a dozen participants shared their ideas, most in a poster-session format. one of the posters focused on serial solutions summon, an early entrant into the vendor web-scale discovery service landscape. at the time, it was a few months from public release. other posters included topics such as the flickr commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. in august 2009, the dean of the unlv university libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. the director of library technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the discovery task force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. to help illustrate some of the events described, a graphical timeline of activities is presented as appendix a; the original charge appears as appendix b. in retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. the libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. the remainder of this article discusses the various steps taken by the discovery task force in evaluating and researching web-scale discovery services. while many libraries have begun to implement the webscale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the unlv libraries, if for no other reason than the majority of commercial services are extremely new. even in early 2010, there was less competition, fewer services to evaluate, information technology and libraries | march 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the discovery task force was given ample time to evaluate the products. based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. in that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. while research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. however, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by oregon state university. the first, dated march 2009, describes a task force whose activities included “scrutinize wcl [worldcat local], investigate other vendors’ products, specifically serials solutions’ summon, the recently announced federated index discovery system; ebsco’s integrated search; and innovative interfaces’ encore product, so that a more detailed comparison can be done,” and “by march 2010, communicate . . . whether wcl or another discovery service is the optimal purchase for osu libraries.”11 note that in 2009, encore existed as a next-generation discovery layer, and it had an optional add on called “encore harvester,” which allows for the harvesting of digital local collections. the report cites the university of michigan’s evaluation of wcl, and adds their additional observations. the march 2009 report provides a features comparison matrix for worldcat local, encore, summon, and libraryfind (an open-source search tool developed at osu that provides federated searching for selected resources). feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). the report also describes some usability testing involving wcl and integration with other local library services. a second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 at the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. the december 2009 report focused on the two released products, serials solutions summon and worldcat local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” the latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at summon. it also mentions obtaining feedback from two early adopters of the summon product, as well as obtaining feedback from librarians whose library had implemented worldcat local. apart from the oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the university of michigan’s article discovery working group, which submitted a final report in january 2010.13 activity: understanding web-scale the first activity of the discovery task force was to educate the members, and later, other library colleagues, on web-scale discovery. terms such as “federated search,” “metasearch,” “next investigations into library web-scale discovery services | vaughan 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. the discovery mini summit served as a springboard that perhaps more by chance than design introduced to unlv library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in spring 2009. in fall 2009, the discovery task force identified reports from entities such as oclc, ithaka, and reports prepared for the library of congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. after the discovery task force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the discovery task force. specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. a summary of these steps appears in the timeline in appendix a. time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. as part of its education role, the discovery task force set up an internal wiki-based webpage in august 2009 upon formation of the group, regularly added content, and notified staff when new content was added. a goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. links to “live” services were provided on the wiki. given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. the wiki also provided links to published research and webcasts on web-scale discovery. such content grew over time as additional webscale discovery products entered general release. in addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. discovery task force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. announcements to relevant vendor programs at the american library association’s annual conference were also posted to the wiki. activity: initial staff survey as noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. as more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. to capture a sense of the library staff ahead of these vendor visits, the discovery task force conducted the first of two staff surveys. the 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. both the initial and subsequent surveys were administered through the online surveymonkey tool. respondents were allowed to skip any question they wished. the survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: information technology and libraries | march 2012 38 features and functionality,” and “content.” the survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. the survey questions appear in appendix c. in hindsight, some of the questions could have benefitted from more careful construction. that said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. a brief summary of some key questions for each section follows. as an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. a snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. web-scale capabilities sa a n d sd physical item status information 81.6% 18.4% publication date sort capability 75.5% 24.5% display library-specified links in the interface 69.4% 30.6% one-click retrieval of full-text items 61.2% 36.7% 2% ability to place ill / consortial catalog requests 59.2% 36.7% 4.1% display the library’s logo 59.2% 36.7% 4.1% to be embedded within various library website pages 58% 42% full-text items first sort capability 58.3% 31.3% 8.3% 2.1% shopping cart for batch printing, emailing, saving 55.1% 44.9% faceted searching 48.9% 42.6% 8.5% media type sort capability 47.9% 43.8% 4.2% 4.2% author name sort capability 41.7% 37.5% 18.8% 2.1% have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% user account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% book cover images 25% 39.6% 20.8% 10.4% 4.2% have a customizable color scheme 24% 58% 16% 2% google books preview button for book items 18.4% 53.1% 24.5% 4.1% tag cloud 12.5% 52.1% 31.3% 4.2% user authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% user authored reviews 6.3% 20.8% 50% 12.5% 10.4% user authored tags 4.2% 33.3% 39.6% 10.4% 12.5% sa = strongly agree; a = agree; n = neither agree nor disagree; d = disagree; sd = strongly disagree table 1. web-scale discovery service capabilities investigations into library web-scale discovery services | vaughan 39 none of the results was surprising, other than perhaps the low interest or indifference in several web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. the unlv libraries already had a next-generation catalog offering these features, and they have not been heavily used. even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. the final survey section focused on content. one question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. results are provided in table 2. a bit surprisingly, inclusion of catalog records was seen as most important. not surprisingly, full-text and a&i content from subscription resources were ranked very highly. it should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. another question listed a dozen existing publishers (e.g., springer, elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” results showed that all publishers were ranked as essential and important. related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. information source rating average ils catalog records 1.69 majority of full-text articles / other research contained in vendorlicensed online resources 2.54 majority of citation records for non-full-text vendor-licensed a&i databases 4.95 consortial catalog records 5.03 electronic reserves records 5.44 records within locally created and hosted databases 5.64 digital collection records 5.77 worldcat records 6.21 ils authority control records 6.5 institutional repository records 6.68 table 2. importance of content indexed in discovery service after the first staff survey was concluded, the discovery task force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. this session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. the task force information technology and libraries | march 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. after the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. questions ran the gamut and collectively touched on all three areas of evaluation. activity: second staff survey within a month after the five vendor onsite visits, a content analysis of the overlap between unlv licensed content and content indexed by the discovery services was conducted. after these steps, a second staff survey was administered. this second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “please rate on a five point likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (worldcat local, summon, eds, encore synergy, primo central).” in addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. the second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. several staff conducted a series of sample searches in each of the services and provided feedback of their findings. though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. the lower response rate is perhaps indicative of several things. first, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the discovery task force wiki site. second, some questions were more appropriately answered by a subset of staff. for example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. finally, intricate details emerged once a thorough analysis of the vendor services was commenced. the first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. as such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. however, results may be provided upon specific request to the author. the questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. as such, questions appear in appendix d. activity: early adopter references one of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. a preliminary shortlist was compiled through a straw vote of the discovery task force—and the results of the vote showed a consensus. this vote narrowed down the discovery task force’s list of services still in contention for a potential purchase. this shortlist was based on the growing mass of research conducted by the discovery task force and informed by the staff surveys and feedback to date. three live customers were identified for each service that had made the shortlist, and the task investigations into library web-scale discovery services | vaughan 41 force successfully obtained two references for each service. reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. to help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix e. the services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. as noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. as such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. several broad insights merit notice, and they are shared below. regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. all respondents indicated the new discovery service is already the default or primary search box on their website. one section of the early adopter questions focused on content. the questions in this area seemed a bit challenging for the respondents to provide lots of detail. in terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. for example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. a majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. this allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. however, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. such problems remain with federated search products integrated with web-scale discovery services. another respondent indicated they were targeting their discovery service at undergraduate research needs. another responded, “as a general rule, i would say the discovery service does an excellent job covering all disciplines. if you start really in-depth research in a specific discipline, it starts to break down. general searches are great . . . dive deeper into any discipline and it falls apart. for example, for a computer science person, at some point they will want to go to acm or ieee directly for deep searches.” related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. the discovery service does not replace the catalog.” in terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. a range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” another example was that “the vendor has been working very hard to balance content types and i’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” another responded, “a common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” information technology and libraries | march 2012 42 when asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. one indicated they had anecdotal feedback. another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). one indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” for student feedback, one indicated, “we have received a few positive comments and see increased usage.” another indicated, “reviews are mixed. we have had a lot of feedback thanking us for providing a search that covers articles and books. they like the ability to do one search and get a mix of resources without the search taking a long time. other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. in general, however, the feedback has been positive.” another replied, “comments we receive are generally positive, but we’ve not collected them systematically.” some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. in terms of their fellow library staff and their initial satisfaction, one respondent indicated, “somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . i would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. there seems to be general agreement that it is a good search tool for the unmediated searcher.” another indicated some concerns with the initial interface provided: “if librarians couldn’t figure it out, users can’t figure it out.” another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. they continually compare it against the catalog. at one point, they weren’t even teaching the discovery service in bib instruction. the only way to improve things it with librarian feedback; it’s getting better, it has been hard. librarians have a hard time replacing the catalog and changing things that they are used to.” in terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. one indicated they had tweaked sort options and added widgets to the interface. another indicated they had done extensive changes to the css. one indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. one respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. that said, they also indicated the latest version was much better than the previous version. one respondent indicated it would be nice if the service could have multiple sources for investigations into library web-scale discovery services | vaughan 43 enriched record content so that better coverage could be achieved. one respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). a few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. one respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” another indicated, “overall, the relevance is good – and it has improved a lot.” another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. one noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. one noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. it is not perfect, but it is constantly evolving and improving.” a final question asked simply, “overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product?” for the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. one noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. the majority were more certain, “yes, i strongly feel that this was the right decision . . . as more users find it, i believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “i do feel it was a good selection.” the external perspective: dialog with web-scale discovery vendors the preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. the following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. the discovery task force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. as such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. activity: vendor identification over the course of a year’s work, the discovery task force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. given that the information technology and libraries | march 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. the beginning, for the unlv libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. as mentioned previously, the discovery mini-summit held at the unlv libraries highlighted one product—serial solutions summon; the only released product at the time of the mini-summit was worldcat local. while no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, worldcat local, to the most recent entrant, primo central. oclc worldcat local, released in november 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the university of washington) is profiled in a 2008 issue of library technology reports.14 in the uw pilot, approximately 30 million article-level items were included with the worldcat database. another product, serials solutions summon, was released in july 2009, and together these two services were the only ones publicly released when the discovery task force began its work. the task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: ebsco eds in january 2010, innovative interfaces encore synergy around may 2010, and ex libris primo central in june 2010. while each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. eds draws heavily from the ebscohost interface (the original version of which dates back to the 1990s), while the base encore and base primo systems were next-generation catalog systems that debuted in 2007. activity: vendor investigations after identification of existing and under development discovery services, a next step in unlv’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. the discovery task force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: section 1: background. “when did product development begin (month, year)?” section 2: locally hosted systems and associated metadata. “with what metadata schemas does your discovery platform work? (e.g., marc, dublin core, ead, etc.)” section 3: publisher/aggregator coverage (full text and citation content). “with approximately how many publishers/aggregators have you forged content agreements ?” section 4: records maintenance and rights management. “how is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” investigations into library web-scale discovery services | vaughan 45 section 5: seamlessness & interoperability with existing content repositories. “for ils records related to physical holdings, is status information provided directly within the discovery service results list?” section 6: usability philosophy. “describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” section 7: local “look & feel” customization options. “which of the following can the library control: color scheme; logo / branding; facet categories and placement; etc.” section 8: user experience (presentation, search functionality, and what the user can do with the results). “at what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” section 9: administration module & statistics. “describe in detail the statistics reporting capabilities offered by your system. does your system provide the following sets of statistics . . .” all vendors were given 2–3 weeks to respond, and all vendors responded. it was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. it was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. a refined subset of the original 71 questions appears as a list of 40 questions in appendix f. apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. the charleston advisor has conducted interviews with several of the library webscale discovery vendors on their products, including ebsco,15 serials solutions,16 and ex libris.17 these interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. an article by ronda rowe reviews summon, eds, and worldcat local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 it also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” a wide variety of archived webcasts, many provided by library journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 vendors themselves each have a section on their corporate website devoted to their service. information provided on these websites ranges from extremely brief to, in the case of worldcat local, very detailed and informative. in addition, much can be gained by “test-driving” live implementations. as such, a listing of vendor website addresses information technology and libraries | march 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix g. activities: vendor visits and content overlap analysis each of the five vendors visited the unlv libraries in spring 2010. vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. vendor visits included a demonstration and q&a for all library staff as well as invited colleagues from other southern nevada libraries, a meeting with the discovery task force, and a meeting with technical staff at unlv responsible for website design and application development and customization. vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix h. questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. on average, approximately 30–40 percent of the library staff attended the open vendor demo and q & a session. shortly after the vendor visits, a content-overlap analysis comparing unlv serials holdings with indexed content in the discovery service was sought from each vendor. given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. some vendors were able to provide detailed coverage information against our existing journal titles (unlv currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). for others, this was more difficult. recognizing this, the head of collection development was asked to provide a list of the “top 100” journal titles for unlv based on such factors as usage statistics and whether the title was a core title for part of the unlv curriculum. the remaining vendors were able to provide content coverage information against this critical title list. four of the five products had quite comprehensive coverage (more than 80 percent) of the unlv libraries’ titles. while outside the scope of this article, “coverage” can mean different things for different services. driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). more discussion on this topic is present in the january 2011 library technology reports on library web-scale discovery services.21 activity: product development tracking one aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkeystyle ils system that dominated library automation for many years. as an example, minor enhancements are provided by serials solutions to summon approximately every three to four weeks; provided by ebsco to ebsco discovery service approximately every three months; and investigations into library web-scale discovery services | vaughan 47 provided by ex libris to primo/primo central approximately every three months. many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. in late summer/early fall 2010, the discovery task force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. the discovery task force’s understanding of web-scale discovery services had expanded greatly since starting their work. coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. a portion of these questions are provided as part of the refined list of questions presented in appendix f. this second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. in particular, the questions related to content were tricky for many (not all) of the vendors to address. still, the discovery task force was able to get a better understanding of how things worked in the evolving discovery environment. combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. activity: recommendation by mid-fall 2010, the discovery task force had conducted and had at their disposal a tremendous amount of research. recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. if all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. given the discovery task force was entering its final phase, official price quotes were sought from each vendor. each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. these lists were anonymized and consolidated into a single, extensive pro/con list for each service. some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). at one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. while complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third information technology and libraries | march 2012 48 choice. the task force presented a summary of findings at a meeting open to all library staff. this meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. prior to drafting the final report and making the recommendation to the dean of libraries, several task force members led a discussion and final question and answer at a libraries’ cabinet meeting, one of the high-level administrative groups at the unlv libraries. vetting by this body represented the last step related to the discovery task force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. the recommendation was broadly accepted by the library cabinet, and shortly afterward the discovery task force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. next steps the dialog above describes the research, evaluation, and recommendation model used by the unlv libraries to select a web-scale discovery service. such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. together, the discovery task force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. after its recommendation, the project progressed from a research and recommendation phase to an implementation phase. the libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the libraries’ website. project implementation co-managers were assigned (the director of technical services and the web technical support manager), as well as key library personnel who would aid in one or more implementation steps. in january 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. the success of a web-scale discovery service at the unlv libraries is a story yet to be written, but one full of promise. acknowledgements the author wishes to thank the other members of the unlv libraries’ discovery task force in the research and evaluation of library web-scale discovery services: darcy del bosque, alex dolski, tamera hanken, cory lampert, peter michel, vicki nozero, kathy rankin, michael yunkin, and anne zald. references 1. marcia j. bates, improving user access to library catalog and portal information, final report, version 3 (washington, dc: library of congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed september 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf investigations into library web-scale discovery services | vaughan 49 2. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies (new york: ithaka s+r, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-20002009/faculty%20study%202009.pdf (accessed september 10, 2010). 3. oclc, online catalogs: what users and librarians want (dublin, oh: oclc, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed september 10, 2010). 4. ibid, vi. 5. ibid, 14. 6. karen calhoun, the changing nature of the catalog and its integration with other discovery tools: final report (washington, dc: library of congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed september 10, 2010). 7. bibliographic services task force, rethinking how we provide bibliographic services for the university of california: final report ([pub location?] university of california libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf (accessed september 10, 2010). 8. oclc, college students’ perceptions of libraries and information resources (dublin, oh: oclc, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed september 10, 2010). 9. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review, in press. 10. bill kelm, “worldcat local effects at willamette university,” presentation, prezi, july 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed sept 10, 2010). 11. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,”march 27, 2009, http://hdl.handle.net/1957/11167 (accessed february 12, 2012). 12. michael boock et al., “discovery services task force recommendation to university librarian,” http://hdl.handle.net/1957/13817 (accessed february 12, 2012). 13. ken varnum et al., “university of michigan library article discovery working group final report,” umich, january 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[access date?] 14. jennifer ward, pam mofjeld, and steve shadle, “worldcat local at the university of washington libraries,” library technology reports 44, no. 6 (august/september 2008). 15. dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (january 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf information technology and libraries | march 2012 50 16. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (april 2010): 60–62. 17. dennis brunning and george machovec, “an interview with nancy dushkin, vp discovery and delivery solutions at ex libris, regarding primo central,” charleston advisor 12, no. 2 (october 2010): 58–59. 18. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (october 2010): 5–10. 19. library journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp (accessed sept 10, 2010). 20. boock, chadwell, and reese, “worldcat local task force report to lamp”; boock et al., “discovery services task force recommendation to university librarian”; ken varnum et al., “university of michigan library article discovery working group final report.” 21. jason vaughan, “library web-scale discovery services,” library technology reports 47, no. 1 (january 2011). note: appendices a–h available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp investigations into library web-scale discovery services: appendices a-h jason vaughan information technology and libraries | march 2012 51 appendices appendix a. discovery task force timeline appendix b. discovery task force charge appendix c. discovery task force: staff survey 1 questions appendix d. discovery task force: staff survey 2 questions appendix e. discovery task force: early adopter questions appendix f. discovery task force: initial vendor investigation questions appendix g. vendor websites and example implementations appendix h. vendor visit questions investigations into library web-scale discovery services | vaughan 52 appendix a. discovery task force timeline information technology and libraries | march 2012 53 appendix b. discovery task force charge discovery task force charge informed through various efforts and research at the local and broader levels, and as expressed in the libraries 2010/12 strategic plan, the unlv libraries have the desire to enable and maximize the discovery of library resources for our patrons. specifically, the unlv libraries seek a unified solution which ideally could meet these guiding principles: • creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • enhances discoverability of as broad a spectrum of library resources as possible • intuitive: minimizes the skills, time, and effort needed by our users to discover resources •supports a high level of local customization (such as accommodation of branding and usability considerations) • supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •demonstrates commitment to sustainability and future enhancements •informed by preferred starting points as such, the discovery task force advises libraries administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. a bulk of the work will entail a marketplace survey and evaluation of vendor offerings. charge specific deliverables for this work include: 1. identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. questions will focus on broad categories such as the following: a. seek to understand how content hosted in our current online systems (iii catalog, contentdm, locally created databases, vendor databases, etc.) could/would (or not be able investigations into library web-scale discovery services | vaughan 54 to) be incorporated or searchable within the discovery platform. apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. more explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. additional questions, such as pricing, maintenance, install base, etc. 4. evaluate gathered information and seek feedback from library staff. 5. provide to the dean’s directs a final report which summarizes the task force findings. this report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. the final report should be provided to the dean’s directs by february 15, 2010. boundaries the work of the task force does not include: • detailing the contents of “hidden collections” within the libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • coordination with other southern nevada nshe entities. • an ils marketplace survey. the underlying innovative millennium system is not being reviewed for potential replacement. • implementation of a selected product. [the charge concluded with a list of members for the task force] information technology and libraries | march 2012 55 appendix c. discovery task force: staff survey 1 questions “rank” means the surveymonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “rate” means there will be a 5 point likert scale ranging from strongly disagree to strongly agree. section 1: customization. the “staff side” of the house 1. customization. it is important for the library to be able to control/tweak/influence the following design element [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  general color scheme  ability to include a unlv logo somewhere on the page.  ability to add other branding elements to the page.  ability to add one or more library specified links prominently in the interface (example: a link to the libraries’ home page)  able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. customization. are there any other design customization capabilities that are significantly important? please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. search algorithms. it is important for the library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] [e.g. the library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. for example, if a user searches for “hoover dam” the library could set a rule that would heavily weight and promote unlv digital collection images for hoover dam – those results would appear on the first page of results]. 4. statistics. the following statistic is important to have for the discovery platform [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  number of searches, by customizable timeframe number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  number of searches generating 0 results investigations into library web-scale discovery services | vaughan 56  number of items accessed by type  number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. statistics. what other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. staff summary. please rank on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  heavy customization capabilities as described in questions 1 & 2 above  ability to tweak search algorithms as described in question 3  ability for the system to natively provide detailed search stats such as described in question 4, 5. section 2. the “end user” side of the house 7. searching. which of the following search options is preferable when a user begins their search [choose one]  the system has a “google-like” simple search box  the system has a “google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  no opinion 8. zero hit searches. for a search that retrieves no actual results: [choose one]  the system should suggest something else or ask, “did you mean?”  retrieving precise results is more important and the system should not suggest something else or ask “did you mean?”  no opinion 9. de-duplication of similar items. which of the following is preferable [choose one]  the system automatically de-dupes records (the item only appears once in the returned list)  the system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  no opinion information technology and libraries | march 2012 57 10. sorting of returned results. it is important for the user to be able to sort or reorder a list of returned results by . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  publication date  alphabetical by author name  alphabetical by title  full text items first  by media type (examples: journal, book, image, etc) 11. web 2.0 functionality on returned results. the following items are important for a discovery platform to have . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] (note, if necessary, please conduct a search in the libraries’ encore system to help illustrate / remember some of the features/jargon mentioned below. in encore, “facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the encore system)  a tag cloud  faceted searching  ability to add user-generated tags to materials (“folksonomies”)  ability for users to write and post a review of an item • other (please specify) 12. enriched record information on returned results. the following items are important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  book covers for items held by the libraries  a google books preview button for print items held by the libraries  displays item status information for print items held by the libraries (example: available, checked out) 13. what the user can do with the results. the following functionality is important to have in the discovery system . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  ability to add items to a cart for easy export (print, email, save, export to refworks) investigations into library web-scale discovery services | vaughan 58  ability to place an interlibrary loan / link+ request for an item  system has a login/user account feature which can store user search information for later. in other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. miscellaneous. the following feature/attribute is important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  the vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  the vendor has designed the product such that it can be incorporated into other sites used by students, such as webcampus and/or social networking sites. such “designs” may include the use of persistent urls to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. end user summary. please rank on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  system offers a “google-like” simple search box only, as detailed in question 7 above  system offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  system de-dupes similar items as detailed in question 9 above(if you believe the system should not dedupe similar items, you would rate this toward the lower end of the spectrum)  system provides multiple sort options of returned results as detailed in question 10 above  system offers a variety of web 2.0 features as detailed in question 11 above  system offer enriched record information as detailed in question 12 above  system offers flexible options for what a user can do with the results, as detailed in question 13 above  system has one or more miscellaneous features as detailed in question 14 above. section 3: content 16. incorporation of different information types. in an ideal world, a discovery platform would incorporate all of our electronic resources, whether locally produced or licensed/purchased from vendors. below is a listing of different information types. please rank on a scale of 1-10 how vital it is information technology and libraries | march 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. innopac millennium records for unlv print & electronic holdings b. link+ records for print holdings held within the link+ consortium c. innopac authority control records d. records within oclc worldcat e. contentdm records for digital collection materials f. bepress digital commons institutional repository materials g. locally created web accessible database records (e.g. the special collections & architecture databases) h. electronic reserves materials hosted in eres i. a majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. the “agricola” database) j. a majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “academic search premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. local content. related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. particular sets of licensed resources, what’s important? please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. elsevier sage wiley springer american chemical society taylor & francis (informaworld) ieee american institute of physics oxford ovid nature emerald investigations into library web-scale discovery services | vaughan 60 section 4: survey summary 19. overarching survey question. the questions above were roughly categorized into three areas. given that no discovery platform will be everything to everybody, please rank on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  the platform is highly customizable by staff (types of things in area 1 of the survey)  the platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  the platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. additional input. the survey above is roughly drawn from a larger list of 71 questions sent to the discovery task force vendors. what other things do you think are really important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. demographic. what library division do you belong to? library administration library technologies research & education special collections technical services user services information technology and libraries | march 2012 61 appendix d. discovery task force: staff survey 2 question for the comparison questions, products are listed by order of vendor presentation. please mark an answer for each product. part i. licensed publisher content (e.g. fulltext journal articles; citations / abstracts) sa = strongly agree; a = agree; n= neither agree nor disagree; d = disagree; sd = strongly disagree 1. “the discovery platform appears to adequately cover a majority of the critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 2. “the discovery platform appears to adequately cover a majority of the second-tier or somewhat less critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 3. overall, from the content coverage point of view, please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 4. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the content coverage standpoint. unacceptable acceptable ex libris primo central investigations into library web-scale discovery services | vaughan 62 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part ii. end-user functionality & ease of use 5. from the user perspective, how functional do you think the discovery platform is? are the facets and/or other methods that one can use to limit or refine a search appropriate? were you satisfied with the export options offered by the system (email, export into refworks, print, etc.)? if you think web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? the platform appears to be severely limited in major aspects of end user functionality the platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 6. from the user perspective, for a full-text pdf journal article, how easy is it to retrieve the full-text? does it take many clicks? are there confusing choices? it’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. it’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products it’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and i don’t feel it would be a barrier to a majority of our users. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central information technology and libraries | march 2012 63 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 7. how satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? did the platform offer “did you mean” spelling suggestions? did the platform offer you the option to request the item via doc delivery / link+? is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? the platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. the platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. i was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 8. how satisfied were you with the platform’s integration with the opac? were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the opac – and did you find this cumbersome or confusing? the platform provides minimal opac item information, and a user the platform appeared to integrate ok with the opac in i was happy with how the platform integrated with the i can’t comment on this particular product because i didn’t see the investigations into library web-scale discovery services | vaughan 64 would have to click through to the opac to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. opac. a majority of the opac information was available in the discovery platform, and/or their connection to the opac was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 9. overall, from an end user functionality / ease of use standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the opac record – please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 10. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the user functionality / ease of use standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iii. staff customization information technology and libraries | march 2012 65 11. the “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . seriously lacking and i feel would need major design changes and customization by library web technical staff. middle of the road – some things i liked, some things i didn’t. the interface design was better than some competing products, worse than others. appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 12. all products offer some level of customization options that allow at least some changes to the “out of the box” platform. based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? the platform appears to be severely limited in the degree or types of customization that can occur at the local level. we appear “stuck” with what the vendor gives us – for better or worse. the platform appeared to have some level of customization, but perhaps not as much as some competing products. yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. i can’t comment on this particular product because i didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. ex libris primo central oclc worldcat local ebsco discovery services innovative encore investigations into library web-scale discovery services | vaughan 66 synergy serials solutions summon 13. overall, from a staff customization standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 14. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iv. summary questions 15. overall, from a content coverage, user functionality, and staff customization standpoint, please rank each product from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon information technology and libraries | march 2012 67 16. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the overall standpoint of content coverage, user functionality, and staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part v. additional thoughts 17. please share any additional thoughts you have on ex libris primo central. (freetext box) 18. please share any additional thoughts you have on oclc worldcat local. (freetext box) 19. please share any additional thoughts you have on ebsco discovery services. (freetext box) 20. please share any additional thoughts you have on innovative encore synergy. (freetext box) 21. please share any additional thoughts you have on serials solutions summon. (freetext box) investigations into library web-scale discovery services | vaughan 68 appendix e. discovery task force: early adopter reference questions author’s note: appendix e originally appeared in the january 2011 library technology reports: web scale discovery services as chapter 7, “questions to consider.” part 1 background 1. how long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. after you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. what have you named your discovery service, and is it the ‘default’ search service on your website at this point? in other words, regardless of other discovery systems (ils, digital collection management system, ir, etc.), has the new discovery service become the default or primary search box on your website? part 2 content: article level content coverage & scope “article level content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. in terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. have you observed any particular strengths in terms of subject content in any of the three major overarching areas -humanities, social sciences, sciences? 6. have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. elsevier journal content)? 9. have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. in general, are you happy with the level of article level metadata associated with the returned information technology and libraries | march 2012 69 citation level results (that is, before one retrieves the complete full text). in other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? part 3 content: your local library resources 12. it’s presumed that your local library ils bib records have been harvested into the discovery solution. do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. if so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. if no local collections other than ils bib record content have been harvested, please skip to question 15. 13. [for local collections other than ils bib records]. did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [for local collections other than ils bib records]. did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? if so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. do you feel your local content (including ils bib records) is adequately “exposed” during a majority of searches? in other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? part 4 interface: general satisfaction level 16. overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? have you received any positive or negative comments from faculty related to the interface? 18. do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? have you received any positive or negative comments from users related to the interface? 19. have you conducted any end-user usability testing related to the discovery service? if so, can you provide the results, or otherwise some general comments on the results of these tests? 20. related to searching, are you happy with the relevance of results returned by the discovery service? have you noticed any consistent “goofiness,” or surprises with the returned results? if you could make a investigations into library web-scale discovery services | vaughan 70 change in the relevancy arena, what would it be, if anything? part 5 interface: local customization 21. has your library performed what you might consider any “major customization” to the product? or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? if you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. in general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? part 6: final thoughts 24. overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product? 25. have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. if you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? if so, what attracted you to this particular product, what made it stand out? information technology and libraries | march 2012 71 appendix f. discovery task force: initial vendor investigation questions section 1: general / background questions 1. customer install base how many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) how many additional customers have committed to the product? how many of these customers fall within our library type (e.g. higher ed academic, public, k-12)? 2. references can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. pricing model, optional products describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. what optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. technical support and troubleshooting briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. size of the centralized index. how many periodical titles does your preharvested, centralized index encompass? how many indexed items? 6. statistics. please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. investigations into library web-scale discovery services | vaughan 72 are the statistics counter compliant? 7. ongoing maintenance activities, local library staff. for instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) section 2: local library resources 8. metadata requirements and existing ingestors. what mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? please verify that your platform has existing connectors -ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. marc 21 bibliographic records; unqualified / qualified dublin core, ead, etc.) please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. our library uses the abc digital collection management software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? our library uses the abc institutional repository software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. resource normalization. is content for both local and remote content normalized to a single schema? if so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. to what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. schedule. for records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? after harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? information technology and libraries | march 2012 73 11. policies / procedures. please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. consortial union catalogs. can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the inn-reach catalog). please describe. section 3: publisher and aggregator indexed content 13. publisher/aggregator agreements: general with approximately how many publishers have you forged content agreements with? are these agreements indefinite or do they have expiration dates? have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. comments on metadata provided. could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. please describe to what degree the following elements play a role in your discovery service: a. “basic” bibliographic information (article title/journal title/author/publication information) b. subject descriptors c. keywords (author supplied?) d. abstracts (author supplied?) e. full text 15. topical content strength do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. humanities, social sciences, sciences). do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). what subject / content areas, if any, do you feel the service may be somewhat weak? are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. major publisher content agreements. are there major publisher agreements that you feel are especially significant for your service? if so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) investigations into library web-scale discovery services | vaughan 74 17. content considered key by local library (by publisher). following is a list of some major publishers whose content the library licenses which is considered “key.” has your company forged agreements with these publishers to harvest their materials. if so please describe in general the scope of the agreement. how many titles are covered for each publisher? what level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. elsevier b. ex. sage c. ex. taylor and francis d. ex. wiley / blackwell 18. content considered key by local library (by title). following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. nature b. ex. american historical review c. ex. jama d. ex. wall street journal 19. google books / google scholar. do any agreements exist at this time to harvest the data associated with the google books or google scholar projects into your central index? if so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. worldcat catalog. does your service include the oclc worldcat catalog records? if so, what level of information is included? the complete record? holdings information? 21. e-book vendors. does your service include items from major e-book vendors? 22. record information. given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. for example: a. you have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. b. you’ve got an agreement with a particular publisher who happens to be the only publisher/provider of that content. they may provide you rich info, or they may provide you rather weak info. in any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. or, information technology and libraries | march 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. c. for some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. in short, there will be in some/many cases of overlap for unique items, such as a particular article title. in such cases, do you create a “merged/composite/super record” -where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. deduping. related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. if your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? or do you create a merged/composite/super record, and only this single record is displayed? please describe. section 4: open access content 24. open access content sources. does your service automatically include (out of the box, no additional charge) materials from open access repositories? if so, could you please list some of the major repositories included (e.g. arxiv e-prints; hindawi publishing corporation; the directory of open access journals; hathi trust materials; etc.). 25. open access content sources: future plans. in addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. exposure to other libraries’ bibliographic / digital collection / ir content. are ils bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? are digital collection records? institutional repository records? section 5: relevancy ranking 27. relevancy determination. please describe some of the factors which comprise the determination of relevancy within your service. what elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. currency. please comment on how heavily currency of an item plays in relevancy determination. does currency weigh more heavily for certain content types (e.g. newspapers)? 29. local library influence. does the local library have any influence or level of control over the relevancy algorithm? can they choose to “bump up” particular items for a search? please describe. 30. local collection visibility. could you please offer some comments on how local content (e.g. ils bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? for example, local content may measures a million items, and your centralized index may cover half a billion items. investigations into library web-scale discovery services | vaughan 76 31. exposure of items with minimal metadata. some items likely have lesser metadata than other items. could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. full text searching. does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) if so, approximately what percentage of items within your service are “deep indexed?” 33. please describe how your system deals when no hits are retrieved for a search. does your system enable “best-match” retrieval – that is, something will always be returned or recommended? what elements play into this determination; how is the user prevented from having a completely “dead-end” search? section 6: authentication and rights management 34. open / closed nature of your discovery solution. does your system offer an unauthenticated view / access? please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. a. licensed full text b. records specifically or solely sourced from abstract and indexing databases c. full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) d. enrichment information (such as book image covers, table of contents, abstracts, etc.) e. other 35. exposure of non-licensed resource metadata. if one weren’t to consider and take into account any e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? this may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). please describe. would a user need to be authenticated to search (and retrieve results from) this “base index?” approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. rights management. please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). information technology and libraries | march 2012 77 our library uses the abc link resolver. our library uses the abc a-z journal listing service. our library uses the abc electronic resource management system. is your discovery solution compatible with one/all of these systems for rights management purposes? is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? section 7: user interface 37. openness to local library customization. please describe how “open” your system is to local library customization. for example, please comment on the local library’s ability to a. rename the service b. customize the header and footer hyperlinks / color scheme c. choose which facet clusters appear d. define new facet clusters e. embed the search box in other venues f. create canned, pre-customized searches for an instance of the search box g. define and promote a collection, database, or item such that it appears at the top or on the first page of any search i. develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as flickr integration; library subject guide widgits such as libguides integration; etc. j. incorporate links to external enriched content (e.g. google book previews; amazon.com item information) k. other 38. web 2.0 social community features. please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). what, if any, plans do you have to offer or expand such functionality in future releases? 39. user accounts. does your system offer user accounts? if so, are these mandatory or optional? what services does this user account provide? a. save a list of results to return to at a later time? investigations into library web-scale discovery services | vaughan 78 b. save canned queries for later searching? c. see a list of recently viewed items? d. perform typical ils functions such as viewing checked out items / renewals / holds? e. create customized rss feeds for a search 40. mobile interface. please describe the mobile interfaces available for your product. is it a browser based interface optimized for smallscreen devices? is it a dedicated iphone, android, or blackberry based executable application? 41. usability testing. briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. what usability testing have your performed and/or do you conduct on an ongoing basis? have any other customers that have gone live with your service completed usability testing that you’re aware of? information technology and libraries | march 2012 79 appendix g: vendor websites and example implementations oclc worldcat local www.oclc.org/us/en/worldcatlocal/default.htm example implementations: lincoln trails library system www.lincolntrail.info/linc.html university of delaware www.lib.udel.edu university of washington www.lib.washington.edu willamette university http://library.willamette.edu serials solutions summon www.serialssolutions.com/summon example implementations: dartmouth college www.dartmouth.edu/~library/home/find/summon drexel university www.library.drexel.edu university of calgary http://library.ucalgary.ca western michigan university http://wmich.summon.serialssolutions.com ebsco discovery services www.ebscohost.com/discovery example implementations: james madison university www.lib.jmu.edu mississippi state university http://library.msstate.edu northeastern university www.lib.neu.edu university of oklahoma http://libraries.ou.edu investigations into library web-scale discovery services | vaughan 80 innovative interfaces encore synergy encoreforlibraries.com/tag/encore-synergy example implementations: university of nebraska-lincoln http://encore.unl.edu/iii/encore/home?lang=eng university of san diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng scottsdale public library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng sacramento public library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng ex libris primo central www.exlibrisgroup.com/category/primocentral example implementations: (note: example implementations are listed in alphabetical order. some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) brigham young university scholarsearch www.lib.byu.edu (note: choose all-in-one search) northwestern university http://search.library.northwestern.edu vanderbilt university discoverlibrary http://discoverlibrary.vanderbilt.edu (note: choose books, media, and more) yonsei university (korea) wisearch: articles + library holdings http://library.yonsei.ac.kr/main/main.do (note: choose the articles + library holdings link. the interface is available in both korean and english; to change to english, select english at the top right of the screen after you have conducted a search and are within the primo central interface) information technology and libraries | march 2012 81 appendix h. vendor visit questions content 1. please speak to how well you feel your product stacks up against the competition in terms of the licensed full-text / citation content covered by your product. based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. from the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). do you feel there are areas which you need to build up? 3. what’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. we have several local content repositories, such as our digital collections in contentdm, our growing ir repository housed in bepress, and locally developed, web-searchable mysql databases. given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. let’s say the library subscribes to an ejournal title – journal of animal studies -that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. if a student tried to search for an article in this journal – “giraffe behavior during the drought season,” what would happen? is this content still somehow indexed in your tool? would the discovery platform invoke our link resolver? please describe. 6. our focus is your next generation discovery platform, and not on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. that said, please briefly describe the role of your federated search product vis a vis the next generation discovery platform. do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? end user interface & functionality 7. are there any particular or unique look and feel aspects of your interface that you feel elevate your product above your competitors? if so, please describe. 8. are there any particular or unique functionality aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. what is your product’s philosophy in regards to this? does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? please describe. investigations into library web-scale discovery services | vaughan 82 related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? in addition, please describe a bit how your relevancy ranking works for returned results. what makes an item appear first or on the first page of results? 10. please describe how “well” your product integrates with the library’s opac (in our case, innovative’s millennium opac). what information about opac holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) in addition, our opac uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. in other words, these scopes are location based, not media type based. for our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire unlv collection. would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? and/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. what is your platform’s philosophy in terms of “dead end searches.” does such a thing exist with your product? please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. staff “control” over the end user interface 12. how “open” is your platform to customization or interface design tweaks desired by the library? are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? what are the major things customizable by the library, and why do you think this is something important that your product offers. 13. how “open” is your platform to porting over to other access points? in other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? could we create a “smaller,” more streamlined version of your interface for smartphone access? overarching question 14. in summary, what are some of the chief differentiators of your product from the competition? why is your product the best and most worthy of serious consideration? abstract introduction why web-scale discovery? q: if you could provide one piece of advice to your library, what would it be? the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... activity: understanding web-scale activity: initial staff survey table 1. web-scale discovery service capabilities activity: second staff survey activity: early adopter references activity: vendor identification activity: vendor investigations activity: product development tracking activity: recommendation next steps references first aid training for those on the front lines: digital preservation needs survey results 2012 jody deridder information technology and libraries | june 2013 18 “the dilemma for the cultural heritage preservation community derives from the lag between immediate need and the long-term transformation of digital preservation expertise.” 1 introduction every day history is being made and recorded in digital form. every day, more and more digitally captured history disappears completely or becomes inaccessible due to obsolescence of hardware, software, and formats.2 although it has long been the focus of libraries and archives to retain, organize, and preserve information, these communities face a critical skills gap. 3 further, the typical library cannot support a true, trusted digital repository compliant with the open archival information system (oais) framework.4 until we have in place the infrastructure, expertise, and resources to distill critical information from the digital deluge and preserve it appropriately, what steps can those in the field take to help mitigate the loss of our cultural heritage? the very “scale of the digital landscape makes it clear that preservation is a process of triage.” 5 while educational systems across the country are scrambling to develop training programs to address the problem, it will be years, if ever, before every cultural heritage institution has at least one of these formally trained employees on staff. librarians and archivists already in place are wondering what they can do in the meantime. those on the front lines of this battlefront to save our cultural history need training. surrounded by content under digitization, digital content coming into special collections and archives, assisting content creators in their research and scholarship, these archivists and librarians need to know what they can do to prevent more critical loss. even if developing a preservation program is limited to ensuring the digital content survives long enough to be collected by some better-funded agency, capturing records in open standard interoperable technology neutral formats would help to ease later ingest of such content into a trusted digital repository.6 as molinaro has pointed out, those in the field need “the knowledge and skills to ensure that their projects and programs are well conceived, feasible, and have a solid sustainability plan.” 7 for those on the front lines, digital preservation education needs to be accessible, practical, and targeted to an audience that may have little technical expertise. since “resources for preservation are meager in small and medium-sized heritage organizations,” 8 such training needs to be free or as low-cost as possible. jody l. deridder (jlderidder@ua.edu) is head of digital services at the university of alabama libraries, tuscaloosa. mailto:jlderidder@ua.edu first aid training for those on the front lines | deridder 19 in an effort to address these needs, the library of congress established the digital preservation outreach & education (dpoe) train-the-trainer network.9 in six one-hour modules,10 this training provides a basic overview of the framework necessary to begin to develop a digital preservation program. the modules formed the basis for three well-attended aserl webinars in february 2012.11 attendee feedback after the webinars indicated a deep need for practical, detailed instruction for those in the field. this article reports on the results of a follow-up survey to identify the topics and types of materials most important to webinar attendees and their institutions for digital preservation, in the fall of 2012. approach the survey was open from october 2 until december 15, 2012. invitations to participate were sent to the following discussion lists: society of american archivists (saa) archives & archivists (a&a), saa preservation section discussion list, saa metadata and digital object round table discussion list, digital-curation (google group), digital library federation (dlf-announce), and the library of congress digital preservation and outreach (dpoe) general listserv. each invitation clarified that respondents need not be association of south eastern research libraries (aserl) members in order to attend the free webinars or to participate in the survey. the survey consisted of three questions, the first to determine the sources of digital content most important for respondents’ institutions to preserve, and the second to identify the topics of greatest concern to respondents themselves. for these two questions, respondents were asked to rate the options as: • extremely important • somewhat important • maybe of value • not important at all the first two questions are as follows: please rate the following sources of digital content in terms of importance for preservation at your institution: • born-digital institutional records • born-digital special collections materials • digitized collections • digital scholarly content (institutional repository or grey literature) • digital research data • web content • other please rate the following topics in terms of importance to you, for inclusion in future training webinars: information technology and libraries | june 2013 20 • how to inventory content to be managed for preservation • developing selection criteria, and setting the scope for what your institution commits to preserving • selecting storage options and number of copies • determining what metadata to capture and store • methods of preservation metadata extraction, creation, and storage • legal issues surrounding access, use, migration, and storage • selecting file formats for archiving • validating files and capturing checksums • monitoring status of files and media • file conversion and migration issues • business continuity planning • security and disaster planning at multiple levels of scope • self-assessment and external audits of your preservation implementation • developing your institution's preservation policy and planning team • planning for provision of access over time • other after each of these questions, respondents were provided a free text field in which to add additional entries related to the “other” entry. the last question on the survey asked respondents whether they are members of an aserl institution, since aserl is supporting this series of webinars. results of the 182 respondents, 37 (20.7 percent) self-identified as aserl members, 142 (79.3 percent) as non-aserl members, and three skipped the question. all respondents answered the first two queries. sources of digital content for the complete set of respondents, the top three types of material considered extremely important for preservation were born-digital special collections materials (65 percent, 117 respondents), born-digital institutional records (62.7 percent, 111 respondents), and digitized collections (61.2 percent, 109 respondents). digital scholarly content, digital research data, and web content trailed in importance, rated extremely important by only 37 percent (64 respondents), 33.9 percent (59 respondents), and 30.6 percent (52 respondents) respectively. in clarification, one respondent listed “born-digital correspondence (e-mail),” another listed “state government digital archival records,” a third asked for instructions for use of “kodak’s new asset protection film for preservation of moving and still images,” and one specified that by “special collections” she meant “audiovisual.” first aid training for those on the front lines | deridder 21 the concern for a/v materials was echoed by some of the 8 respondents suggesting other content as extremely important: “born-digital moving image preservation” (an aserl respondent), “best practices for preservation of different audio and video formats” (also an aserl respondent), “born digital photographs and video of college events,” and a request for an “audio digitization workshop.” additional “other” entries were copyright pitfalls, data security, and “very practical steps that very small institutions can take to preserve their digital materials (e.g. how to check digital integrity, and how often, selection of storage media, and creation of a ‘dark archive’).” one aserl respondent indicated that she did not rate “born digital” institutional and special collections materials as extremely important for preservation only because her institution does not yet have a system set up for these, nor do they yet collect many born-digital special collections. she clarified that she does think this is extremely important despite the seeming lack of interest on the part of her institution. figure 1. results for all survey respondents indicating sources of digital content of importance for preservation at their institution. information technology and libraries | june 2013 22 in comparing the responses to the first question by whether the respondents self-identified as members of an aserl institution (37 respondents as opposed to 142), those who did considered born-digital special collections materials far more important (73 percent, 27 respondents) than non-aserl respondents (62.9 percent, 88 respondents), but this still was rated most important by both groups. second for aserl respondents was digitized collections (69.4 percent, 25 respondents) whereas born-digital institutional records held second place for non-aserl respondents (62 percent, 85 respondents). third and fourth-ranked material sources for aserl respondents were born-digital institutional records (64.9 percent, 24 respondents) and digital scholarly content (63.9 percent, 23 respondents); digital research data only rated 52.8 percent (19 respondents). non-aserl respondents considered digitized collections the third most important source of digital content for preservation (59.7 percent, 83 respondents), and this group of respondents was far less concerned with digital scholarly content (29.9 percent, 40 respondents) or digital research data (29.6 percent, 40 respondents) than the aserl respondents. web content ranked lowest for both groups: 29.4 percent (10) aserl respondents and 30.6 percent (41) nonaserl respondents considered this content extremely important. figure 2. results for aserl survey respondents indicating sources of digital content of importance for preservation at their institution. first aid training for those on the front lines | deridder 23 figure 3. results for non-aserl survey respondents indicating sources of digital content of importance for preservation at their institution. perhaps most surprising was that 20 non-aserl respondents (14.8 percent) rated digital research data as “not important at all” for preservation at their institutions, but this may be reflective of their type of institution. museums and historical societies, non-research institutions, and government agencies likely are not concerned with research data; this theory seems to be supported by the 12.7 percent (17) non-aserl respondents who rated digital scholarly content as “not important at all.” in comparison, only one aserl respondent (2.8 percent) indicated that research data had no importance to his institution for preservation (0 for digital scholarly content). this may simply reflect a lack of awareness of current issues on the part of the respondent. topics of interest both groups of respondents agreed on the three most important topics for future training webinars. “methods of preservation metadata extraction, creation and storage” led the way with 77.3 percent (140 respondents: 70.3 percent or 26 aserl and 79.4 percent or 112 non-aserl) information technology and libraries | june 2013 24 listing this as extremely important. next was “determining what metadata to capture and store” (68 percent, 96 respondents: 62.2 percent or 23 aserl and 66.7 percent or 120 non-aserl). the third most important topic is “planning for provision of access over time” at 65.4 percent (117 respondents: 1.1 percent or 22 aserl and 65.7 percent or 92 non-aserl). figure 4. results for all survey respondents indicating topics of importance to them, for future training webinars. fourth in importance overall was “file conversion and migration issues” (58.8 percent, 107 respondents: 54.1 percent or 20 aserl and 60.6 percent or 86 non-aserl), though the aserl respondents thought this topic was slightly less critical than “developing selection criteria, and setting the scope for what your institution commits to preserving” (56.8 percent, 21 respondents as opposed to 49.6 percent or 70 non-aserl respondents; overall percentage 51.9 percent, 94 respondents). close in relative importance were “validating files and capturing checksums” (53.9 percent, 97 respondents), “monitoring status of files and media” (52.8 percent, 95 respondents), and “developing your institution’s preservation policy and planning team” (51.1 percent, 92 first aid training for those on the front lines | deridder 25 respondents). interestingly, however, “validating files and capturing checksums” is far more important to non-aserl respondents (53.6 percent, 75 respondents) than those from aserl institutions (only 37.8 percent, 14 respondents). “legal issues surrounding access, use, migration and storage” is a more important topic for aserl respondents (51.4 percent, 19 respondents) than non-aserl (42.8 percent, 77 respondents), and aserl respondents were more concerned (37.8 percent, 14 respondents) than non-aserl (33.1 percent, 46 respondents) with “selfassessment and external audits.” additionally, “selecting file formats for archiving” and “selecting storage options and number of copies” is more important for non-aserl (47.5 percent, 67 respondents and 47.9 percent, 67 respondents) than aserl respondents (35.1 percent, 13 respondents and 32.4 percent, 12 respondents, respectively). figure 5. results for aserl survey respondents indicating topics of importance to them, for future training webinars. “security and disaster planning” was ranked extremely important by only 32.6 percent (45) respondents overall, followed by “business continuity planning” at only 29.2 percent (40) respondents. the latter may reflect a lack of widespread awareness of just how critical the loss of information technology and libraries | june 2013 26 a single key employee can be, especially in smaller institutions. it also seems clear that there’s a level of complacency or sense of security about our ephemeral digital content that may be in error. then again, it is quite possible that the respondents are not administrators and feel they do not have the power in their organizations to address such issues. figure 6. results for non-aserl survey respondents indicating topics of importance to them, for future training webinars. additional topics considered extremely important to respondents are as follows, listed in the free text area (the last four by aserl members): • "clean" work station setup—hardware & software for ingest, virus scan, checksum, disk image, metadata, conversion, etc. • integrating tools into your workflow. there is a need to address the nuts and bolts for those of us that are further along in determining the metadata required to capture, selection criteria, and asset audit and preservation policy. first aid training for those on the front lines | deridder 27 • methods for providing researchers access to born digital content (not necessarily online, could be just in-house). • strategies for locating digital assets on physical media in large collections that have been using mplp [“more product, less process”] for decades. • format determination and successful migration or emulation. • staff diversity and training. • how to validate files, migrate files, and which born-digital institutional files our special collections needs to be preserving. • creating and maintaining effective organizational models for digital preservation (i.e. collaboration with central it and/or external vendors, etc.). • case studies of digital preservation, establishing workflow of digital preservation. • web archiving (best practices, alternatives to archive-it, methods of selection, etc.). • one (non-aserl) respondent said it was “somewhat important” to include the topic of “trends for field, future outlook.” conclusions the results from this survey are clear: free or low-cost training needs to focus immediately on preservation of born-digital special collections materials, born-digital institutional records, and digitized collections. the topics of prime importance to respondents were “methods of preservation metadata extraction, creation and storage,” “determining what metadata to capture and store,” and “planning for provision of access over time.” the variations in ratings between respondents from self-identifying as aserl members versus non-aserl members indicates that the needs of those in research libraries differs somewhat from that of cultural heritage institutions in the field dealing with “the long tail” of digital content. 12 future training may need to target these differing audiences appropriately to ensure these needs are met. additionally, administrators need to be addressed as a unique audience in order to focus on the requirements for addressing “security and disaster planning” and “business continuity planning,” as these critical areas need to be developed by those in management positions. future surveys of this nature should include a component to determine the level of technical expertise and support the respondents have, as well as a measure of their position or power in the administrative hierarchy. continued surveys would be extremely helpful in ensuring that available educational options meet the needs of librarians and archivists in the field. as molinaro has pointed out, “getting the right information in the right hands at the right time is a problem that has plagued the library community for decades.” 13 now is the time to develop free, openly available, practical digital preservation training for those on the front lines, if we are to retain critical cultural heritage materials which are only available in digital form. for them to effectively perform necessary triage on incoming digital content, they must be trained in “first aid.” our history is at stake. information technology and libraries | june 2013 28 references 1. paul conway, “preservation in the age of google: digitization, digital preservation, and dilemmas,” library quarterly 80, no. 1 (january 2010): 73–74, doi:10.1086/64846.3. 2. clifford lynch, “challenges and opportunities for digital stewardship in the era of hope and crisis” (keynote speech, is&t archiving 2009 conference, arlington, virginia, may 2009). 3. karen f. gracy and miriam b. kahn, “preservation in the digital age,” american library association, library resources and technical services 56, no. 1 (2012): 30. 4. marshall breeding, “from disaster recovery to digital preservation,” computers in libraries 32, no. 4 (2012): 25. 5. mike kastellec, “practical limits to the scope of digital preservation,” information technology & libraries 31, no. 2 (2012): 70, doi:10.6017/ital.v31i2.2167. 6. charles dollar and lori ashley, “digital preservation capability maturity model,” ver. 2.4, (november 2012), https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 (accessed dec. 24, 2012). 7. mary molinaro, “how do you know what you don’t know? digital preservation education,” information standards quarterly 22, no. 2 (2010): 45. 8. conway, “preservation in the age of google,” 70. 9. library of congress, “digital preservation outreach & education: dpoe background,” accessed december 31, 2012, www.digitalpreservation.gov/education/background.html. 10. library of congress, “digital preservation outreach & education: dpoe curriculum,” accessed december 31, 2012, www.digitalpreservation.gov/education/curriculum.html. 11. jody l. deridder, “introduction to digital preservation—a three-part series based on the digital preservation, outreach and education (dpoe) model,” association of southeastern research libraries, 2012, [archived webinars], accessed december 31, 2012, www.aserl.org/archive. 12. jody l. deridder, “benign neglect: developing life rafts for digital content,” information technology & libraries 30:2 (june 2011): 71–74. 13. molinaro, “how do you know what you don’t know?” 47. https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 http://www.digitalpreservation.gov/education/background.html http://www.digitalpreservation.gov/education/curriculum.html http://www.aserl.org/archive/ 6 information technology and libraries | march 2011 i n the new lita strategic plan, members have suggested an objective for open access (oa) in scholarly communications. some people describe oa as articles the author has to pay someone to publish. that can be true, but that’s not how i think of it. oa is definitely not vanity publishing. most oa journals are peer-reviewed. i like the definition provided by enablingopenscholarship: open access is the immediate (upon or before publication), online, free availability of research outputs without any of the restrictions on use commonly imposed by publisher copyright agreements.1 my focus on oa journals increased precipitously when the licensing for a popular american weekly medical journal changed. we could only access online articles from one on-campus computer unless we increased our annual subscription payment by 500 percent. we didn’t have the funds, and now the students suffer the consequences. i think it was an unfortunate decision the journal’s publishers made. i know from experience that if a student can’t access the first article they want, they will find another one that is available. interlibrary loan is simpler than ever, but i think only the patient and curious students will make the effort to contact us and request an article they cannot obtain. in 2006 scientist gary ward wrote that faculty at many institutions experience problems accessing current research. when faculty teach “what is available to them rather than what their students most need to know, the education of these students and the future of science in the u.s. will suffer.” he explains it is a false assumption that those who need access to scientific literature already have it. interlibrary loans or pay-per-view are often offered by publishers as the solution to the access problem, but this misses an essential fact of how we use the scientific literature: we browse. it is often impossible to tell from looking at an abstract whether a paper contains needed methodological detail or the perfect illustration to make a point to one’s students. apart from considerations of cost, time, and quality, interlibrary loans and pay-per-views simply do not meet the needs of those of us who often do not know what we’re looking for until we find it.2 i want our medical students and tomorrow’s doctors to have access to all of the most current medical research. we offer the service of providing jama articles to students, but i’m guessing that we hear from a small percentage of the students who can’t access the full text online. are people reading oa articles? not only are scholars reading the articles, but they are citing those articles in their publications. consider the public library of science’s plosone (http://www.plosone.org/home.action), a peerreviewed, open-access, online publication that features reports on primary research from all disciplines within science and medicine. in june 2010, plosone received its first impact factor of 4.351—an impressive number. that impact factor puts plosone in the top 25 percent of the institute for scientific information’s (isi) biology category.3 the impact factor is calculated annually by isi and represents the average number of citations received per paper published in that journal during the two preceding years.4 in other words, articles from plosone published in 2008 and 2009 were highly cited. is oa making an impact in my medical library? i believe it is, although i won’t be happy until our students can access the online journals they want from off campus and the library won’t have to pay outrageous licensing fees. we have more than one thousand online oa journal titles in our list of online journals. the more full text they can access, the less they’ll have to settle for their second or third choice because their first choice is not available online. i’m glad that lita members included oa in their strategic plan. the number of oa journals is increasing, and i believe we will continue to see that the articles are reaching readers and making a difference. i don’t think ital will be adopting the “author pays” model of oa, but the editorial board is dedicated to providing lita members with the access they want. references 1. enablingopenscholarship, “enabling open scholarship: open access,” http://www.openscholarship.org/jcms/c_6157/ open-access?portal=j_55&printview=true, (accessed jan. 18, 2011). 2. ward, gary, “deconstructing the arguments against improved public access,” newsletter of the american society for cell biology, nov. 2006, http://www.ascb.org/filetracker .cfm?fileid=550 (accessed jan. 18, 2011). 3. davis, phil, “plos one: is a high impact factor a blessing or a curse?” online posting, june 21, 2010, the scholarly kitchen, http://scholarlykitchen.sspnet.org/2010/06/21/plosone -impact-factor-blessing-or-a-curse/ (accessed jan. 18, 2011). 4. thomson reuters, “introducing the impact factor,” http://thomsonreuters.com/products_services/science/ academic/impact_factor/ (accessed jan. 18, 2011). cynthia porter editorial board thoughts: is open access the answer? cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. library use of web-based research guides jimmy ghaphery and erin white information technology and libraries | march 2012 21 abstract this paper describes the ways in which libraries are currently implementing and managing webbased research guides (a.k.a. pathfinders, libguides, subject guides, etc.) by examining two sets of data from the spring of 2011. one set of data was compiled by visiting the websites of ninety-nine american university arl libraries and recording the characteristics of each site’s research guides. the other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. in conclusion, a discussion follows that includes implications for the library technology community. selected literature review while there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. there has been recent work on the efficacy of research guides as well as strategies for their promotion. there is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. subject-based research guides have a long history in libraries that predates the web as a servicedelivery mechanism. a literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 by the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 the format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. a number of other articles start to answer some of the questions about usability posed in the 2007 literature review by vileno. in 2008, grays, del bosque, and costello used virtual focus groups as a test bed for guide evaluation.3 two articles from the august 2010 issue of the journal of library administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 also in 2010, vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 jimmy ghaphery (jghapher@vcu.edu) is head, library information systems and erin white (erwhite@vcu.edu) is web systems librarian, virginia commonwealth university libraries, richmond, va. mailto:jghapher@vcu.edu library use of web-based research guides | ghaphery and white 22 in terms of cross-library studies, an interesting collaboration in 2008 between cornell and princeton universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 the work of morris and grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 most recently, libguides has emerged as a leader in this arena, offering a popular software-as-aservice (saas) model and as such is not yet heavily represented in the literature. a multichapter libguides lita guide is pending publication and will cover such topics as implementing and managing libguides, setting standards for training and design, and creating and managing guides. arl guides landscape during the week of march 3rd, 2011, the authors visited the websites of 99 american university arl libraries to determine the prevalence and general characteristics of their subject-based research guides. in general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. all 99 libraries offered research guides that were easy to find from the library home page. libguides was very prominent as a platform, in production at 67 of the 99 libraries. among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static html pages) to libguides. in addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. for each site, the presence of course-based research guides was recorded. in some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. course guides were found on 75 of the 99 libraries visited. of these, 63 were also libguides sites. it is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. nonetheless, it appears that the use of libguides encourages the presence of public-facing course guides. qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple a-to-z listing of all guides to separately curated landing pages specifically organized by discipline. the number of guides was recorded for each libguides site. it was possible to append “/browse.php?o=a” to the base url to determine how many guides and authors were published at each site. this php extension was the publicly available listing of all guides on each libguides platform. the “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. the authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. of information technology and libraries | march 2012 23 the 63 libguides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. on average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). at the high end of the scale, one site had 713 guides from 46 authors. based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. in addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 survey the literature review and website visits call out two strong trends: 1. research guides are as commonplace as books in libraries, 2. libguides is the elephant in the room, so much so that it is hard to discuss research guides without discussing libguides. based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. a ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. it was distributed to a number of professional discussion lists the week of april 19, 2011 (see appendix). the following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. the survey was made available for two weeks following the list announcements. survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. ten institutions submitted more than one response. in these cases only the first response was included for analysis. we did not complete a response for our own institution. the vast majority (155, 82%) of respondents were from college or university libraries. of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. among the college and university libraries, 17 (9%) identified themselves as members of the arl, which comprises 126 members.9 in terms of “what system best describes your research guides by subject?” the results were similar to the survey of arl websites. most libraries (129, 69%) reported libguides as their system, followed by “customized open source system” and “static html pages,” both at 20 responses (11% each). sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” in terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. when asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research library use of web-based research guides | ghaphery and white 24 guides were “initiated by public services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). in the middle at 34 responses (18%) was “initiated by an informal crossdepartmental group.” only 10 respondents (5%) selected “initiated by systems,” with the top down approach of “initiated by administration” gathering 14 responses (7%). when narrowing the responses to those sites that are using libguides or campus guides, the portrait is not terribly different, with 36% library-wide, 35% public services, 18% informal cross-departmental, 7% administration, and systems trailing at 4%. likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “n/a we do not have a systems department.” there were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. for sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. in contrast, 37% of sites running libguides or campusguides indicated “considerable” or “some” technical involvement. further, the libguides and campusguides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. interestingly, 20% of libguides and campus guides users answered “n/a we do not have a systems department,” which is not significantly higher than all respondents for this question at 19%. the level of interaction between research guides and enterprise library systems was not reported as strong. when asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (cms). only 12 respondents (6%) said that their cms was integrated with their research guides with a remaining 13 (7%) saying that their cms was used for “both our website and our research guides.” a similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. when asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“n/a we do not have a discovery tool”). only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” in the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. when asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. information technology and libraries | march 2012 25 answer total percent libguides/campusguides course pages 127 68% 74% “how to” instruction 123 65% 77% alphabetical list of all databases 76 40% 42% “about the library” information (for example hours, directions, staff directory, event) 59 31% 35% digital collections 34 18% 19% everything—we use the research guide platform as our website 16 9% 9% none of the above 17 9% 2% table 1. other types of content hosted on research guides system these answers reinforce the portrait of integration within the larger library web presence. while the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. it is also consistent with the findings from the arl website visits, where course pages were consistently found within the research guides platform. for sites reporting libguides or campusguides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” responses are summarized in table 2. library use of web-based research guides | ghaphery and white 26 answer total percent percent using libguides/campusguides style guides for consistent presentation 105 56 58 maintenance and upkeep of guides 94 50 53 link checking 87 46 50 required elements such as contact information, chat, pictures, etc. 78 41 56 training for guide creators 73 39 43 transfer of guides to another author due to separation or change in duties 72 38 41 defined scope of appropriate content 43 23 22 allowing and/or moderating user tags, comments, ratings 36 19 25 none of the above 36 19 19 controlled vocabulary/tagging system for managing guides 23 12 25 table 2. management policies/procedures for research guides while nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. the highest percentage for any given policy was 56% for “style guides for consistent presentation.” best practices in these areas could be emerging or many of these policies could be specific to individual library needs. as with the survey question on content, the research-guides platform also has a role with the libguides and campusguides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). in both information technology and libraries | march 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. based on this supposition, it is somewhat surprising that the libguides and campusguides sites reported the same lack of policy adoption (none of the above; 19%). the final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” results were compiled into a spreadsheet. the authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. one in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. forty-our respondents (23.4%) said they had no evaluation method in place (“ouch! it hurts to write that.”), though many expressed an interest or plans to begin evaluation. another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. this included one respondent who had adopted libguides in light of prohibitive it regulations (“we choose libguides because it would not allow us to create class specific research webpages”). several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. conclusion this study confirms previous research that web-based research guides are a common offering, especially in academic libraries. adding to this, we have recorded a quantitative adoption of libguides both through visiting arl websites and through a survey distributed to library listservs. further, this study did not find a consistent management or assessment practice for library research guides. perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. it appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. what are the implications for the library technology community and what questions arise for future research? the apparent ascendancy of libguides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and saas. time will tell how this might spread to other library systems. the popularity of libguides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. more generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? how do attitudes library use of web-based research guides | ghaphery and white 28 toward research guides differ between public services and technical services? hopefully these questions serve as a call for continued technical engagement with library research guides. what shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. references 1. luigina vileno, “from paper to electronic, the evolution of pathfinders: a review of the literature,” reference services review 35, no. 3 (2007): 434–51. 2. martin courtois, martha higgins, aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33 , no. 2 (2005): 188–96. 3. lateka j. grays, darcy del bosque, and kristen costello, “building a better m.i.c.e. trap: using virtual focus groups to assess subject guides for distance education students,” journal of library administration 48, no. 3/4 (2008): 431–53. 4. mira foster et al., “marketing research guides: an online experiment with libguides,” journal of library administration 50, no. 5/6 (july/september, 2010): 602–16; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (july/september, 2010): 638–56. 5. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed august 8, 2011). 6. angela horne and steve adams, “do the outcomes justify the buzz? an assessment of libguides at cornell university and princeton university—presentation transcript,” presented at the association of academic and research libraries, seattle, wa, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-oflibguides-at-cornell-university-and-princeton-university (accessed august 8, 2011). 7. sarah morris and marybeth grimes, “a great deal of time and effort: an overview of creating and maintaining internet-based subject guides,” library computing 18, no. 3 (1999): 213–16. 8. mathew miles and scott bergstrom, “classification of library resources by subject on the library website: is there an optimal number of subject labels?” information technology & libraries 28, no. 1 (march 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed august 8, 2011). 9. association of research libraries, “association of research libraries: member libraries,” http://www.arl.org/arl/membership/members.shtml (accessed october 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml information technology and libraries | march 2012 29 appendix. survey library use of web-based research guides please complete the survey below. we are researching libraries’ use of web-based research guides. please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. responses are anonymous. thank you for your help! jimmy ghaphery, vcu libraries erin white, vcu libraries 1) what is the name of your organization? __________________________________ note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. any publication of results will not include specific names of organizations. 2) which choice best describes your library? o arl o university library o college library o community college library o public library o school library o private library o governmental library o nonprofit library 3) what type of system best describes your research guides by subject? o libguides or campusguides o customized open source system o other commercial system o homegrown system o static html pages 4) which statement best describes the selection of your current research guides system? o initiated by administration o initiated by systems o initiated by public services o initiated by an informal cross-departmental group o more of a library-wide initiative library use of web-based research guides | ghaphery and white 30 5) how much ongoing involvement does your systems department have with the management of your research guides? o no ongoing involvement o some ongoing involvement o considerable ongoing involvement o n/a we do not have a systems department 6) what other type of content do you host on your research guides system? o course pages o “how to” instruction o alphabetical list of all databases o “about the library” information (for example: hours, directions, staff directory, events) o digital collections o everything—we use the research guide platform as our website o none of the above 7) which statement best describes the relationship between your discovery/federated search tool and your research guides? o we typically do not include our discovery tool on our guides o our discovery tool is one of many search options we promote on our guides o we prominently feature our discovery tool on our guides o n/a we do not have a discovery tool 8) which statement best describes the relationship between your web content management system and your research guides? o our content management system is independent of our research guides o our content management system is integrated with our research guides o our content management system is used for both our website and our research guides o n/a we do not have a content management system 9) which of the following procedures or policies do you have in place for your research guides? o defined scope of appropriate content o required elements such as contact information, chat, pictures, etc. o style guides for consistent presentation o allowing and/or moderating user tags, comments, ratings o training for guide creators o controlled vocabulary/tagging system for managing guides o maintenance and upkeep of guides o link checking information technology and libraries | march 2012 31 o transfer of guides to another author due to separation or change in duties o none of the above 10) how do you evaluate the success or failure of your research guides? [free text] editorial board thoughts | dehmlow 103 i n the age of the internet, google, and the nearly crushing proliferation of metadata, libraries have been struggling with how to maintain their relevance and survive in the face of shrinking budgets and misinformed questions about whether libraries still provide value. in case there was ever any question, the answer is “of course we do.” still, an evolving environment and changing context has motivated us to rethink what we do and how we do it. our response to the shifting environment has been to envision how libraries can provide the best value to our patrons despite an information ecosystem that duplicates (and to some extent replaces) services that have been a core part of our profession for ages. at the same time, we still have to deal with procedures for managing resources we acquire and license, and many of the systems and processes that have served us so well for so many years are not suitable for today’s environment. many have talked about the need to invest in the distinctive services we provide and unique collections we have (e.g., preserving the world’s knowledge and digitizing our unique holdings) as a means to add value to libraries. there are many other ways libraries create value for our users, and one of the best is for us to respond to needs that are specific to our organizations and users— specialized services, focused collections, contextualized discovery, all integrated into environments in which our patrons work, such as course management systems, google, etc. the library market has responded to many of our needs with ermss and next-generation resource management and discovery solutions. all of this is a good start, but like any solution that is designed to work for the greatest common denominator, they often leave a “desired functionality gap” because no one system can do everything for everyone, no development today can address all of the needs of tomorrow, and very rarely do all of the disparate systems integrate with each other. so where does that leave libraries? well, every problem is an opportunity, and there are two important areas that libraries can invest in to ensure that they progress at the same pace as technology, their users, and the market: open systems that have application programmer interfaces (apis), and programmers. apis are a means to access the data and functionality of our vended or opensource systems using a program as opposed to the default interface. apis often take the shape of xml travelling in the same way that webpages do, accessed via a url, but they also can be as complex as writing code in the same language as the base system, for example software development kits (sdks). the key here is that apis provide a way to work with the data in our systems, be they backend inventory or front-end discovery interfaces, in ways that weren’t conceived by the software developers. this flexibility enables organizations to respond more rapidly to changing needs. no matter which side of the opensource/vended solution fence you sit on, openness needs to be a fundamental part of any decision process for any new system (or information service) to avoid being stifled by vendor or open-source developer priorities that don’t necessarily reflect your own. the second opportunity is perhaps the more difficult one given the state of library budgets and that the resources that are needed to hire programmers are higher than most other library staff. but having local programming skills easily accessible will be vital to our ability to address our users’ specific needs and change our internal processes as we need to. i think it is good to have at least one technical person who comes from an industry outside of libraries. they bring knowledge that we don’t necessarily have and fresh perspectives on how we do things. if it is not possible to hire a programmer, i would encourage technology managers to look closely at their existing staff, locate those in the organization who are able to think outside of the box, and provide some time and space for them to grow their skill set. i am not so obtuse as to suggest that anyone can be programmer—like any skill it requires a general aptitude and a fundamental interest—but i am a self-taught developer who had a technical aptitude and an strong desire to learn new things, and i suspect that there are many underutilized staff in libraries that with a little encouragement, mentoring, and some new technical knowledge, could easily work with apis and sdks, thereby opening the door for organizations to be nimble and responsive to both internal and external needs. i recognize that with heavy demands it can be difficult to give up some of these highly valued people’s time, but the payoff is overwhelmingly worth it. these days i can only chuckle at the doomsday predictions about libraries and the death of our services— google’s dominance in the search arena has never really made me worried that libraries would become irrelevant. we have too much that google does not, specifically licensed content that our users desire, and we have relationships with our users that google will be incapable of having. i have confidence that what we have to offer will be valuable to our users for some time to come. however, it will take a willingness to evolve with our environment and to invest in skill sets that come at a premium even when it is difficult to do so. mark dehmlow editorial board thoughts: adding value in the internet age— libraries, openness, and programmers mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, university of notre dame, notre dame, indiana. graphs in libraries: a primer | powell et al. 157 james e. powell, daniel alcazar, matthew hopkins, robert olendorf, tamara m. mcmahon, amber wu, and linn collinsgraphs in libraries: a primer answer routine searches is compelling. how, we wonder, can we bring a bit of google to the library world? google harvests vast quantities of data from the web. this data aggregation is obviously complex. how does google make sense of it all so that it can offer searchers the most relevant results? answering this question requires understanding what google is doing, which requires a working knowledge of graph theory. we can then apply these lessons to library systems, make sense of voluminous bibliometric data, and give researchers tools that are as effective for them as google is for web surfers. just as web surfers want to know which sites are most relevant, researchers want to know which of the relevant results are the most reliable, the most influential, and of the highest quality. can quantitative metrics help answer these qualitative questions? the more deeply libraries and librarians can mine relationships between articles and authors and between subjects and institutions, the more reliable are their metrics. suppose some librarians want to compare the relative influence of two authors. they might first look at the authors’ respective number of publications. but are those papers of equally high quality? they might next count all citations to those papers. but are the citing articles of high quality? deeper still, they might assign different weights to each citing article using its own number of citations. at each step, whether realizing it or not, they are applying graph theory. with deeper knowledge of this subject, librarians can embrace complexity and harness it for research tools of powerful simplicity. ■■ pagerank and the global giant graph indexing the web is a massive challenge. the internet is a network of computer hardware resources so complex that no one really knows exactly how it is structured. in fact, researchers have resorted to conducting experiments to discern the structure and size of the internet and its potential vulnerability to attacks. representations of the data collected by these experiments are based on network whenever librarians use semantic web services and standards for representing data, they also generate graphs, whether they intend to or not. graphs are a new data model for libraries and librarians, and they present new opportunities for library services. in this paper we introduce graph theory and explore its real and potential applications in the context of digital libraries. part 1 describes basic concepts in graph theory and how graph theory has been applied by information retrieval systems such as google. part 2 discusses practical applications of graph theory in digital library environments. some of the applications have been prototyped at the los alamos national laboratory research library, others have been described in peer-reviewed journals, and still others are speculative in nature. the paper is intended to serve as a high-level tutorial to graphs in libraries. part 1. introduction to graph theory complexity surrounds us, and in the twenty-first century, our attempts at organization and structure sometimes lead to more complexity. in layman’s terms, complexity refers to problems and objects that have many distinct but interrelated issues or components. there also is an interdisciplinary field referred to as “complex systems,” which investigates emergent properties, such as collective intelligence.1 emergent properties are an embodiment of the old adage “the whole is greater than the sum of its parts.” these are behaviors or characteristics of a system “where the parts don’t give a real sense of the whole.”2 libraries reside at the nexus of these two definitions: they are creators and caretakers of complex data sets (metadata), and they are the source of explicit records of the complex and evolving intellectual and social relationships underlying the evolution of knowledge. digital libraries are complex systems. patrons visit libraries hoping to find some order in complexity or to discover a path to new knowledge. instead, they become the integration point for a complex set of systems as they juggle resource discovery by interacting with multiple systems, either overtly or via federated search, and by contending with multiple vendor sites to retrieve articles of interest. contrast this with google’s simple approach to content discovery: a user enters a few terms in a single box, and google returns a large list of results spanning the internet, placing the most relevant results at the top of this list. no one would suggest using google for all research needs, but its simplicity and recognized ability to james e. powell (jepowell@lanl.gov) is research technologist, daniel a. alcazar (dalcazar@lanl.gov) is professional librarian, matthew hopkins (mfhop@lanl.gov) is library professional, tamara m. mcmahon (tmcmahon@lanl.gov) is library technology professional, amber wu (amber.ponichtera@gmail.com) is graduate research assistant, and linn collins (linn@lanl .gov) is technical project manager, los alamos national laboratory, los alamos, new mexico. robert olendorf (olendorf@unm .edu) is data librarian for science and engineering, university of new mexico libraries, albuquerque, new mexico. 158 information technology and libraries | december 2011 influence a person has in a business context. if we want to analyze this aspect of the network, then it makes sense to consider the fact that some relationships are more influential than others. for example, a relationship with the president of the company is more significant than a relationship with a coworker, since it is a safe assumption that a direct relationship with the company leader will increase influence. so we assign weights to the edges based on who the edge connects to. google does something similar. all the webpages they track have centrality values, but google’s weighting algorithm takes into account the relative importance of the pages that connect to a given resource. the weighting algorithm bases importance on the number of links pointing to a page, not the page’s internal content, which makes it difficult for website authors to manipulate the system and climb the results ladder. so if a given webpage science, also known as graph theory. this is not the same network that ties all the computers on the internet together, though at first glance it is a similar idea. network science is a technique for representing the relationships between components of a complex system.3 it uses graphs, which consist of nodes and edges, to represent these sets of relationships. generally speaking, a node is an actor or object of some sort, and an edge is a relationship or property. in the case of the web, universal resource locators (urls) can be thought of as nodes, and connections between pages can be thought of as links or edges. this may sound familiar because the semantic web is largely built around the idea of graphs, where each pair of nodes with a connecting edge is referred to as a triple. in fact, tim berners-lee refers to the semantic web as the global giant graph—a place where statements of facts about things are published online and distinctly addressable, just as webpages are today.4 the semantic web differs from the traditional web in its use of ontologies that place meaning on the links and in the expectation that nodes are represented by universal resource identifiers (uris) or by literal (string, integer, etc.) values, as shown in figure 1, where the links in a web graph have meaning in the semantic web. semantic web data are a form of graph, so graph analysis techniques can be applied to semantic graphs, just as they are applied to representations of other complex systems, such as social networks, cellular metabolic networks, and ecological food webs. herein lies the secret behind google’s success: google builds a graph representation of the data it collects. these graphs play a large role in determining what users see in response to any given query. google uses a graph analysis technique called eigenvector centrality.5 in essence, google calculates the relative importance of a given webpage as a function of the importance of the pages that point to it. a simpler centrality measure is called degree centrality. degree centrality is simply a count of the number of edges a given node has. in a social network, degree centrality might tell you how many friends a given person has. if a person has edges representing friendship that connect him to seventeen other nodes, representing other people in the network, then his degree value is seventeen (see figure 2). if a person with seventeen friends has more friendship edges than any other person in the network, then he has the highest degree centrality for that network. eigenvector centrality expands on degree centrality. consider a social network that represents the amount of figure 1. a traditional web graph is compared to a corresponding semantic web graph. notice that replacing traditional web links with semantic links facilitates a deeper understanding of how the resources are related. graphs in libraries: a primer | powell et al. 159 networks evidence for the evolution of metabolic processes.7 chemists have used networks to model reactions in a step-wise fashion by “editing” graphs representing models of molecules and their reactivity,8 and they also have used graphs to better comprehend phase transition states, such as the freezing of water or the emergence of superconductivity when a material is cooled.9 economists have used graphs to model market trades and the effects of globalization.10 infectious disease specialists have used networks to model the spread of disease and to evaluate prospective vaccination plans.11 sociologists have modeled the complex interactions of people in communities.12 and in libraries, computer scientists have explored citation networks and coauthorship networks,13 and they have developed maps of science that integrate scientific papers, their topics, the journals in which they appear, and comsumers’ usage patterns to provide a new view of the pursuit of science.14 network science can make complexity more comprehensible by representing a subset of actors and relationships in a complex system as a graph. these has only two edges, it may still rank higher than a more connected page if one of the pages that links to it has a large number of pages pointing to it (see figure 3). this weighted degree centrality measure is eigenvector centrality, and a higher eigenvector centrality score causes a page to show up closer to the top of a google results set. the user never sees a graph, but this graphbased approach to exploring a complex system (the web), works quite well for routine web searches. ■■ graph theory graph theory, also known as network science, has evolved tremendously in the last decade. for example, information scientists have discovered hubs in the web that connect large numbers of pages, and if removed, disconnect large portions of the network.6 biologists have begun to explore cellular processes, such as metabolism, by modeling these processes as networks and have even found in these figure 2. friendship network figure 3. node 2 ranks higher than node 1 because node 3, which connects to node 2, has more incoming links than node 1. node 3 is deemed more important than node 9, which has no incoming links. 160 information technology and libraries | december 2011 as subgraphs, e.g., in the case where a person has two friends who are also mutual friends. small world networks have numerous highly interconnected subgroups called clusters. these may be distributed throughout the network in a regular fashion, with a few random connections that connect the otherwise disconnected clusters. these random links have the effect of greatly reducing the path length between any two nodes and explain the oft-cited six degrees of separation that connect all people to one another. in social networks, agency is often described as the mechanism by which graphs can then be explored visually and mathematically. graphs can be used to represent systems as they are, to extract subsets of these systems, or to discover wholly artificial collections of relationships between components of a speculative system. data also can be represented as graphs when they consist of “measurements that are either of or from a system conceptualized as a network.”15 in short, graphs offer a continuum of techniques for comprehending complexity and are suitable either for a layman with casual interest in a topic or a serious researcher ferreting out discrete details. at the core of network science is the graph. as stated earlier, a graph is a collection of nodes and the edges that connect some of those nodes, together representing a set of actors and relationships in a type of system. relationships can be unidirectional (e.g., in a social network, when the information flows from one person to another) or bidirectional (e.g., when the information flows back and forth between two individuals). relationships also can vary in significance and can be assigned a weight—for example, a person’s relationship to his or her supervisor might be weighted more heavily than a person’s relationship to his or her peers. a graph can consist of a single type of node (for subjects) and a single type of edge connecting those nodes (for predicates). these are called unipartite graphs. from the standpoint of graph theory, these are the easiest types of graphs to work with. graphs that represent two relationships (bipartite) or more are typically reduced to unipartite graphs in the process of exploring them because the vast majority of techniques for evaluating graphs were developed for graphs that address a single relationship between a set of nodes. ■■ global properties of graphs there are other aspects of graphs to consider, sometimes referred to as “global graph properties.”16 there are two basic classes of networks: homogeneous networks and inhomogeneous networks.17 these graphs exhibit characteristics that may not be comprehensible by close examination (e.g., by examining degree centrality, node clustering, or paths within a graph)18 but may be apparent, depending on the size and the way in which the graph is rendered, merely by looking at a visualization of the graph. in homogeneous graphs, nodes have no significant difference between their number of connections. examples include random graphs, complete graphs, and small world networks. in random graphs there is an equal probability that any two nodes will be connected (see figure 4), while in complete graphs (see figure 5) all nodes are connected with one another. random graphs are often used as tools to evaluate networks that describe real systems. complete graphs might occur in social networks figure 4. a random graph. any given node has an equal probability of being linked to any other node figure 5. a complete graph. all nodes are connected to all other nodes graphs in libraries: a primer | powell et al. 161 building blocks of networks.20 a three-node feedback motif is a set of nodes where the edges between them form a triangle and the edges are directional. in other words, node a is connected to (and might convey some information to) node b; node b, in turn, has the same relationship with node c; and node c is connected to (and conveys information back to) node a. in digital libraries, for example, if similar papers exhibit the same pattern of connectivity to a group of subject or keyword categories, motifs will make it possible to readily identify the topical overlap between them. collections of nodes that have a high degree of connectivity with each other are called clusters.21 in many complex systems, clusters are formed by preferential attachment. a group of highly clustered nodes that have low connectivity to the larger graph is known as a clique. while there are other aspects of graphs that can be explored, these four—node centrality measures, paths between nodes, motifs, and clustering—are accessible to most users and are significant in graphs representing bibliographic metadata and textual content. this will become clearer in the examples that follow. ■■ quantitative evaluation of graphs returning now to centrality measures, two of particular interest in digital libraries are degree centrality and betweenness centrality (or flow centrality). an interesting aspect of graphs is that, regardless of the data being represented, centrality measures and clustering characteristics often reveal important clues about the system that the data these random links get established. agency refers to the idea that multiple, often unpredictable actions on the part of individuals in a network result in unanticipated connections between people. examples of such actions are hobbies, past work experience, meeting someone new while on a trip to another country—pretty much anything that takes a person outside his or her normal social circles. in the case of inhomogeneous graphs, not all nodes are created equal. one type, scale-free networks, is common in a variety of systems ranging from biological to technological (see figure 6. these exhibit a structure in which a few nodes play a central role in connecting many others. these hubs form as a result of preferential attachment, known colloquially as “the rich get richer.” researchers became aware of scale-free networks as a result of analysis of the web when it was in its infancy. scale-free networks have been documented in biology, social networks, and technological networks. as a result, they are quite important in the field of information science. small world and scale-free networks are typical of complex systems that occur in nature or evolve because of emergent dynamic processes, in which a system self-organizes over time. small world networks provide fast, reliable communication between nodes, while scale-free networks are more fault tolerant, making them ideal for systems such as living cells, which are frequently challenged by the external environment.19 ■■ local properties of graphs below the ten-thousand-foot system-level view of networks, graphs can be scrutinized more closely using many other techniques. we will now consider four broad categories of local characteristics that describe networks and how they are, or could be, applied in digital libraries: node centrality measures, paths between nodes, motifs, and clustering. centrality measures make it possible to determine the importance of a given node in a network. degree centrality, in its simplest form, is simply a count of the number of edges connected to any given node in a network: a node with high-degree centrality has many connections to other nodes compared to a typical node in the graph. paths make it possible to explore the connections between nodes. an author who is two degrees removed from another author—in other words, the friend of a friend of a friend—has a path length of 2. researchers are often interested specifically in the shortest path between a given pair of nodes. many other types of paths can be explored depending on the type of network, but in libraries, paths that describe the flow of ideas or communication between people are most likely to be useful. motifs are the fundamental recurring structures that make up the larger graph, and they often are called the figure 6. example of a scale-free coauthorship network. a few nodes have many links, and most nodes have few or a single link to another node 162 information technology and libraries | december 2011 path that connects a node through other nodes back to itself. within graph visualization tools, the placement of nodes can vary from one layout to another. what matters is not the pictorial representation (though this can be useful), but the underlying relationships between nodes (the topology). along with clustering, paths help differentiate motifs, which are considered to be building blocks of some types of complex networks. since bibliographic metadata represents communication in one form or another, it is often most common to apply social network theory to graphs. but it is also possible to apply various centrality measures to graphs that are not social and to use these to discover significant nodes within those graphs. in part 2 we consider various unipartite and bipartite graphs that might be especially useful for examining digital library metadata. part 2. graph theory applications in digital libraries library systems, by virtue of the content they contain, are complex systems. fielded searches, faceted searches, and full-text searches all allow users to access aspects of the complex system. fielded searches leverage the explicit structure that has been encoded into the metadata describing the resources that users are ultimately trying to find (articles, books, etc). full-text searches enable users to explore in a more free-form manner, subject of course to the availability of searchable text. often, full-text search means the user is searching titles, abstracts, and other content that summarizes a resource, rather than the actual full text of articles and books. even if the user is searching the full content, there are relationships and aspects of the content that are not readily discernible through a full-text search. furthermore, there is not one single, comprehensive digital library—many library systems live in the deep web, that is, they are databases that are not indexed by search engines like google, and so users must describes, whether it’s coauthorship relationships or protein interactions in the cell of a living organism. often the clusters or nodes that exhibit a higher score in some centrality calculation are significant in some meaningful way compared to other nodes. recall that degree centrality refers to how many edges a given node has. degree centrality can vary significantly in strength depending on the relationships that are represented in the graph. consider a graph of citations between papers. while it may be obvious to humans that the mostly highly cited papers will have the highest-degree centrality, computers have no idea what this means. it is still up to humans to lend a degree of comprehensibility to the raw calculation: in other words, to understand that a paper with high-degree centrality is an important paper, at least among the papers the graph represents. betweenness centrality exposes how integral a given node is to a network. basically, without getting into the mathematics, it measures how often a node falls on the shortest path between other nodes. thus, nodes with high betweenness centrality do not necessarily have a lot of edges, but they bridge disparate clusters. in an informational network, the nodes with high betweenness centrality are crucial to information flow, social connections, or collaborations. hubs are examples of nodes with high betweenness centrality. the removal of a hub causes large portions of a network to become detached. in figure 7, the node labeled “folkner, w.m.” exhibits high betweenness centrality, since it connects two clusters together. a cluster coefficient expresses whether a given node in a network is a member of a tightly interlinked collection of nodes, or clique. the cluster coefficient of an entire graph reveals the overall tendency for clustering in a graph, with higher cluster coefficients typical of small world graphs. in other types of graphs, clusters sometimes manifest as homophily; that is, nodes of a given type are highly interconnected with one another and have few connections with nodes of other types. in social networks, this is sometimes referred to as the “birds of a feather” effect. in a more current reference, the effect was explored as a function of the likelihood that someone would “unfriend” an acquaintance on the social networking site facebook.22 in some networks (such as the internet), clusters are connected by hubs, while in others, the hub is the primary connecting node of other nodes. paths refer to the edges that connect nodes. the simplest case of a path is an edge that connects two nodes directly. path analysis addresses the set of edges that connect two nodes that are not, themselves, directly connected. the shortest path, as its name implies, refers to the route that uses the least number of edges to connect from node a to node b and measures the number of edges, not the linear distance. walks and paths refer to a list of nodes between two nodes, with walks allowing repeat visits to nodes, and paths not allowing them. cycles refer to a figure 7. paths in a coauthorship network graphs in libraries: a primer | powell et al. 163 coauthorship (collaboration) networks coauthorship (collaboration) networks are typically small world networks in which crossand interdisciplinary work provides the random links that connect various clusters (see figure 8). these graphs can be explored to determine which researchers are having the most influence in a given field; influence is a function of frequency of authorship. a prime example is the collaboration network graph for paul erdős, a highly productive mathematician. the popularity of his influence in academia has lead to the creation of the erdős number, which is “defined as indicating the topological distance in the graph depicting the co-authorship relations.”23 liu et al. proposed a node analysis measure that they called authorrank, which establishes weighted directed edges between authors. the author ’s authorrank value is a sum of the weighted edges connected to that author.24 these networks also can be used to explore how an idea spreads and what opportunities may exist for future collaborations, as well as many other existing and potential relationships. citation graphs citation graphs more strongly resemble scale-free networks, in which early papers in a given field tend to accumulate more links. such hub papers can be cited hundreds or even thousands of times while most papers are cited far less often or not at all. many researchers have explored citation graphs, though the person often credited with first noting the network characteristics of citation patterns was dereck j. de solla price in 1965.25 more recently, mark newman introduced the concept of what he calls “first mover advantage” to describe the preferential attachment observed in citation networks.26 search each individually. but if more of these systems adopted semantic web standards, they could be explored as graphs, and relationships between different databases would be easier to discern and represent to the user. many libraries have tried to emulate google by incorporating federated search engines with a single search box as an interface. this copies the form of google’s search engine but not its underlying power. to do that, libraries must enhance full-text searches by drawing on relationships. a full-text search will (hopefully) find relevant papers on a given topic, but a researcher often wants to find the best papers on that topic. to meet that need, libraries must harness the information contained in relationships; otherwise each paper is stuck in a vacuum. cited references are one way to connect papers. for researchers and librarians alike, this is a familiar metric for assessing a paper’s relative importance. the web of science and scopus are two databases that perform this function. looked at another way, citation counts are nothing more than degree centrality applied to a simple graph in which papers are nodes and references are edges. thus, in the framework of graph theory, citation analysis is just a small sliver of a world of possible relationships, many of which are unexplored. the following examples outline use case scenarios in which graph techniques are or could be applied to library data, such as bibliographic metadata, to help users find relationships and conduct research. ■■ informational graphs intrinsic to digital library systems there are multiple relationships represented within and between metadata contained in library systems that can be represented as graphs and explored using graph techniques. some of these, such as citation networks, are among the most well-studied informational networks. citation networks are valued because the data describing them is readily accessible and because scientists studying classes of networks have used them as surrogates for exploring scale-free networks. they are often evaluated as static networks (i.e., a snapshot in time) but some also have dynamic characteristics (e.g., they change and grow over time or they allow information-flow analysis). techniques such as pagerank can be used to evaluate information when the importance of a linking resource is as important as the number of links to a resource. multirelational networks can be developed to explore dynamic processes in research fields by using library data to provide the basic topological framework for some of the explorations. figure 8. a coauthorship network 164 information technology and libraries | december 2011 network with three types of nodes: one to represent individual pieces of debris, a second to represent collections of debris that are the original object that the debris is a fragment of, and a third to represent conjunction events (near misses) between objects. another example of graphs being used as tools is the case of developing vaccination strategies to curtail the spread of an infectious disease.30 in this case, networks have been used to determine that one of the best strategies for curtailing the transmission of a disease is to identify and vaccinate hubs, rather than to conduct mass vaccination campaigns. in libraries, graphs as tools could be used to help researchers identify collaboration opportunities, to disambiguate author identities and aggregate related materials, to allow library staff to evaluate the academic contribution of a group of researchers (bibliometrics), and to explore geospatial and temporal aspects of information, including changes in research focus over time. graphs for author name disambiguation author name disambiguation is a long-standing problem in libraries. many resources have been devoted to manual and automatic name authority control, yet the problem remains unsolved. projects such as oclc viaf and efforts to establish unique author identifiers will no doubt improve the situation, but many problems remain.31 meanwhile, we have experimented with an approach to author name matching by generating multirelational graphs. authors subject–author (expertise) graphs graphs that connect authors by subject areas can vary because of the granularity of subject headings (see figure 9). high-level subject headings tend to function as hubs, but more useful relationships are revealed by specific subject headings and author-provided keywords. the map of science merges publications and citations with actual end user usage patterns captured in library systems and deals, in part, with categories of scientific research.27 it clusters publications and visualizes them “as a journal network that outlines the relationships between various scientific domains.” implicit in this a model is the relationship of authors to subject areas. institution–topic and nation–topic (expertise) graphs from a commercial or geopolitical perspective, graphs that represent institutional or national expertise can reveal valuable information for scientists, elected officials, and investors, particularly in networks that represent the change in a given organization or region’s contributions to a field over time. metadata for scientific papers typically includes enough information to generate nodes and edges describing this. the resulting graph can reveal unexpected details, such as national or institutional efforts to nurture expertise in a given field, and the results of those efforts. the visualization of this data may take the form of icons that vary in shape and size depending on various aspects of nodes in the institution-topic network. these visual representations can then be overlaid onto a map, with various visual aspects of the icons also affected by centrality measures applied to a given institution’s contributions.28 ■■ graphs as tools graph representations can be used as tools to explore a variety of complex systems. even systems that do not initially appear to manifest networks of relationships can often be better understood when some aspect of the system is represented as a graph. this approach requires thinking about what aspects of information needs, discovery, or consumption might be represented or evaluated using networks. two interesting examples from other fields will illustrate the point. a 2009 paper in acta astronautica proposed that techniques to reduce the amount of space junk in orbit around the earth could be evaluated using graph theory techniques.29 the authors propose a dynamic multirelational figure 9. a subject–author graph for stephen hawking graphs in libraries: a primer | powell et al. 165 computation over time because it is typically so important to understanding data. allen’s temporal intervals address machine reasoning over disparate means of recording the temporal aspects of events.33 another temporal computing concept that has applicability to graphs is from the memento project, which makes it possible for users to view prior versions of webpages.34 entities in the memento ontology can become predicates in triples, which in turn can become edges in graphs. using graphs, time can be represented as a relationship between objects or as a distinct object within a graph. nodes that connect through a temporal node may overlap, coincide, or co-occur. nodes that cluster around time represent something important about the objects. genomic-document and proteindocument networks many people hoped that mapping the human genome would result in countless medical advances, but the process whereby genes manifest themselves in living organisms turned out to be much more complex—there wasn’t just a simple mapping between genes and organism traits, there were other processes controlled by genes representing additional layers of complexity scientists had not anticipated. today biologists apply network science to these processes to reveal the missing pieces of this puzzle.35 just as the process itself is complex, the information needs of these researchers benefit from a more sophisticated approach. biologists often need to find papers that reference a given gene or protein sequence. and so, representing these relationships (e.g., article–gene) as graphs has the added benefit of making the digital library research data compatible with the methods that biologists already use to document what they know about these processes. although this is a specialized type of graph, a similar approach might be valuable to researchers in a number of scientific disciplines, including materials science, astrophysics, and environmental sciences. graphs of omission one of the less obvious capabilities of network science is to make predictions about complex systems by looking for missing nodes in graphs.36 this has many applications: for example, identifying a hub in the metabolic processes of bacteria can yield new targets for antibiotics, but it is vital to know that interrupting the enzyme that serves as that hub will effectively kill the organism. making predictions about the evolution of research by identifying areas for cross-disciplinary collaboration or areas where little work has been done—enabling a researcher to leverage are the primary nodes of interest, but relationships such as topic areas, titles, dates, and even soundex representations of names also are represented. as one would expect, phonetically similar names cluster around particular soundex representations. shared coauthorship patterns and shared topic areas can reveal that two different names are for the same author as, for example, when a person’s name changes after marriage (see figure 10). graphs for title or citation deduplication string edit distance involves counting the number of changes that would need to be made to one string to convert it to another, and it is one of the most common approaches to deduplicating titles, citations, and author names. multirelational graphs, in which titles are linked to authors, publication dates, and subjects, result in subgraphs that appear virtually identical when two title variants are represented. centrality measures can be applied to unipartite subgraphs of these networks to home in on areas where data duplication may exist. temporal-topic graphs for analyzing the evolution of knowledge over time a particularly active area of research in graph theory is the representation of dynamical systems as networks. a dynamical system is described as a complex system that changes over time.32 computer scientists have developed various strategies and technologies to cope with figure 10. two authors with similar names linked by subject nodes 166 information technology and libraries | december 2011 basis for an on-the-fly search expansion tool. a querysuggestion tool might look at user-entered terms and determine that some are hubs, then suggest related terms from nodes that connect to those hub nodes. remember, graphs need not be visible to be useful! global subject resolution using dbpedia although dbpedia appears to lag behind wikipedia in terms of completeness and scrutiny by domain experts, it offers one mechanism for unifying user-provided tags, author keywords, and library-assigned subject headings with a graph of known facts about a topic. links into and out of dbpedia’s graphs on a given topic would enable serendipitous knowledge discovery through browsing these semantic graphs. viaf linked author data oclc’s effort to convert tens of millions of identity records into graphs describing various attributes of authors promises to enhance exploration of digital library content on the author dimension.42 these authority records contain a wealth of information, linking name variations, basic genealogical data such as birth and death dates, associations with institutions, subject areas, and titles published by authors. although some rough edges need to be smoothed (one of the authors of this paper discovered that his own authorship data was linked with another author of the same name), iterative refinement of this data as it is actually used may enable crowd-sourced the first-mover advantage and thus advance his or her career—is a valuable service that libraries are well positioned to provide (see figure 11). machine-supplied suggestions offer another type of prediction. for example, providing the prompt “did you mean john smith and climate change?” can leverage real or predicted relationships between author and subject (see figure 12). graphs, in turn, can be used to create tools that will simplify an author–subject search. viral concept detection phase transition typically refers to a process in thermodynamics that describes the point at which a material changes from one state of matter to another (e.g., liquid to solid). phase transition also applies to the dispersal of a new idea. interestingly enough, graphs representing matter at the point of phase transition, and graphs representing the spread of a fad in a social network, exhibit the same recognizable pattern of change: suddenly there are links between many more nodes, there’s a dramatic increase in clustering, and something called a giant component emerges.37 in a giant component, all of the nodes in that portion of the graph are interlinked, resulting in a complete graph like figure 5. this is not so different from what one observes when something “goes viral” on the internet. in a library, a dynamic graph showing the usage of new keywords for emerging subject areas would likely reflect a similar pattern. ■■ linked data graph examples cross-collection graphs, or graphs that link data under your control to data published online, can be constructed by building links into the web of linked data.38 linked data refers to semantic graphs of statements that various organizations publish on the web. for example, geonames.org publishes millions of statements about geographic locations on the linked data web.39 as these graphs grow and evolve, opportunities emerge for using this data in combination with your own data in various ways. for example, it would be quite interesting to develop a network representation of library subject headings and their relationships to concepts in the encyclopedic linked data collection known as dbpedia.40 the resulting graph could be used in a variety of ways: for example, to evaluate the consistency of statements made about concepts, to establish semantic links between user-provided tags and concepts,41 or to function as the figure 11. identifying areas for collaboration: a co-author graph with many simple motifs and few clusters might indicate a field ripe for collaboration graphs in libraries: a primer | powell et al. 167 content could be represented and explored as a graph, and some research has already shown that geographic networks—especially those representing human-constructed entities such as cities and transportation networks—exhibit small world characteristics.45 another way graphs can express geographic relationships in useful ways would be in representing the concept of nearness. waldo tobler’s first law of geography states that “everything is related to everything else, but near things are more related than distant things.”46 in practice, human beings define nearness in different ways, so a graph representing a shared concept of nearness would be very valuable, particularly in exploring works associated with biological, ecological, geological, or evolutionary sciences. graph representations of nearness could be developed by librarians working with scientists and could be the geographic equivalent to subject guides and finding aids. they also might be useful across disciplines and would enable machine inferencing across data that include geographic relationships. still other kinds of graphs what might a digital library tool based on graph theory look like? what could it do? it wouldn’t necessarily depict visualizations of graphs (though in some cases visual graphs are the most efficient way to impart concepts). after all, citation databases utilize graph theory, but the user only sees a number (cite count) and lists of articles (citing or cited). in many cases, then, the tool would perform graph evaluation techniques behind the scenes, translating these metrics into simple descriptive queries for the user. for example, a user interested in the most influential papers in his field would enter his subject, and then on the backend, the tool would apply eigenvector centrality to that subject’s citation graph. if the same user finds an especially relevant article, clicking a “find similar articles” button will produce a list of papers in that graph with the shortest path length to the paper in question. researchers also could use this tool to evaluate authors and institutions in various ways: ■■ is my output diverse or specialized compared to my colleagues? the tool assigns a score for each author based on degree centrality in a subject-author graph. ■■ i want to find potential collaborators. the tool returns authors connected to researcher by the shortest path length in a coauthorship graph. ■■ i want to collaborate with colleagues from other departments at my institution. high betweenness centrality quality control that will more rapidly identify and resolve these problems. linked geographic data using geonames it is ironic that the use of networks to describe geographic aspects of the world is in its infancy, considering that many consider leonhard euler’s attempt to find a mathematical solution to the seven bridges of königsberg problem in 1735 to be the birth of the field.43 as some authors have pointed out, geometric evaluation of geographic relationships is actually a poor way to explore geographic relationships.44 graphs can be used to express arbitrary relationships between geographically separated objects, and it is perhaps no accident that our road and railway systems are in fact among the most familiar graphs that people encounter in the real word. a subway map is a graph where subway stations are nodes linked by railway. graphs can represent the relationships between topological features, the visibility of buildings in a city to one another, or the land, sea, and air transportation that links one country to another. geonames supplies a rich collection of geographic information that includes descriptions of geopolitical entities (cities, states, countries), geophysical features, and various names that have been ascribed to these objects. the geographic relationships in intellectual figure 12. find similar articles: a search for hv reynolds might prompt a suggestion for sd miller, who has a similar authorship pattern 168 information technology and libraries | december 2011 nov. 21, 2007, timbl’s blog, http://dig.csail.mit.edu/bread crumbs/node/215. 5. lawrence page et al., the pagerank citation ranking: bringing order to the web (1999), http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.31.1768. 6. duncan s. callaway et al., “network robustness and fragility: percolation on random graphs,” physical review letters 85, no. 25 (2000): 5468–71. 7. adreas wagner and david a. fell, “the small world inside large metabolic networks,” proceedings of the royal society b: biological sciences 268, no. 1478 (2001): 1803–10. 8. gil benko, christopher flamm, and peter f. stadler, “a graph-based toy model of chemistry,” journal of chemical information and modeling 43, no. 4 (2003): 1085–93. 9. tad hogg, bernardo a. huberman, and colin p. williams, “phase transition and the search problem,” artificial intelligence 81 (1996): 1–15. 10. vladimir boginski, sergiy butenko, and panos m. pardalos, “mining market data: a network approach,” computers & operations research 33, no. 11 (2006): 3171–84. 11. zoltán dezső and albert-lászló barabási, “halting viruses in scale-free networks,” physical review e 65, no. 5 (2002), doi: 10.1103/physreve.65.055103. 12. hans noel and brendan nyhan, “the ‘unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence,” sept. 2010, http://arxiv.org/abs/1009.3243. 13. johan bollen et al., “toward alternative metrics of journal impact: a comparison of download and citation data,” information processing & management 41, no. 6 (2005): 1419–40; xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41, no. 6 (2005): 1462–80. 14. johan bollen et al., “clickstream data yields highresolution maps of science,” ed. alan ruttenberg, plos one 4, no. 3 (3, 2009): e4803. 15. eric kolaczyk, statistical analysis of network data (new york; london: springer, 2009). 16. alejandro cornejo and nancy lynch, “reliably detecting connectivity using local graph traits,” csail technical reports mit-csail-tr-2010–043, 2010, http://hdl.handle .net/1721.1/58484 (accessed feb. 17, 2011). 17. réka albert, hawoong jeong, and albert-lászló barabási, “error and attack tolerance of complex networks,” nature 406, no. 6794 (2000): 378–82. 18. m. e. j. newman, “scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” physical review e 64, no. 1 (2001), doi: 10.1103/physreve.64.016132. 19. albert, jeong, and barabási, “error and attack tolerance.” 20. r. milo, “network motifs: simple building blocks of complex networks,” science 298, no. 5594 (2002): 824–27. 21. lawrence j. hubert, “some applications of graph theory to clustering,” psychometrika 39, no. 3 (1974): 283–309. 22. noel and nyhan, “the ‘unfriending’ problem.” 23. alexandru balaban and douglas klein, “co-authorship, rational erdős numbers, and resistance distances in graphs,” scientometrics 55, no. 1 (2002): 59–70. 24. liu et al., “co-authorship networks in the digital library research community.” 25. derek j. de solla price, “networks of scientific papers,” in a subject–author graph for that institution may locate potential “bridge” subjects to collaborate in. ■■ i’m leaving my current job. what other institutions are doing similar work? in an institution–subject graph, the shorter the path length between two institutions, the more comparable they may be. graphs also enable libraries to reach outside their own data to build connections with other data sets. heterogeneity, which makes relational database representations of arbitrary relationships difficult or impossible, becomes a trivial matter of adding additional nodes and edges to bridge collections. the linked data web defines simple requirements for establishing just such representations, and libraries are wellpositioned to build these bridges. ■■ conclusion for many centuries, libraries have served as repositories for the accumulated knowledge and creative products of civilization, and they contain mankind’s best efforts at comprehending complexity. this knowledge includes scientific works that strive to understand various aspects of the physical world, many of which are complex and require the efforts of numerous researchers over time. since the advent of the dewey decimal system, librarians have worked on many fronts to make this knowledge discoverable and to assist in its evaluation. qualitative evaluation increasingly requires understanding a resource in a larger context. we suggest that this context is itself a complex system, which would benefit from the modeling and quantitative evaluation techniques that network science has to offer. we believe librarians are well positioned to leverage network science to explore and comprehend emergent properties of complex information environments. as motivation for this pursuit, we offer in closing this prescient quote from carl woese, which though focused on the discipline of biology, is equally applicable to the myriad complexities of modern life: “a society that permits biology to become an engineering discipline, that allows that science to slip into the role of changing the living world without trying to understand it, is a danger to itself.”47 references 1. melanie mitchell, complexity: a guided tour (oxford, england; new york: oxford univ. pr., 2009). 2. carl woese, “a new biology for a new century,” microbiology and molecular biology reviews (june 2004): 173–86, doi: 10.1128/mmbr. 68.2.173–186.2004. 3. national research council (u.s.), network science (washington, d.c.: national academies pr., 2005). 4. tim berners-lee, “giant global graph,” online posting, graphs in libraries: a primer | powell et al. 169 networks,” proceedings of the national academy of sciences of the united states of america 98, no. 2 (jan. 16, 2001): 404–9. 38. chris bizer, richard cyganiak, and tom heath, how to publish linked data on the web? http://sites.wiwiss.fu-berlin.de/ suhl/bizer/pub/linkeddatatutorial/ (accessed feb. 17, 2011). 39. geonames, http://www.geonames.org/ (accessed feb. 17, 2011). 40. dbpedia, http://dbpedia.org/ (accessed february 17, 2011). 41. alexandre passant and phillippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” proceedings of the www 2008 workshop linked data on the web (ldow2008), bejing, apr. 2008, doi: 10.1.1.142.6915. 42. oclc, “viaf”; oclc homepage, http://www.oclc.org/ us/en/default.htm (accessed feb. 17, 2011). 43. norman biggs, graph theory, 1736–1936 (oxford, england; new york: clarendon, 1986). 44. bin jiang, “small world modeling for complex geographic environments,” in complex artificial environments (springer berlin heidelberg, 2006): 259–71, http://dx.doi.org/10.1007/3 -540-29710-3_17. 45. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010), http://www.dlib.org/dlib/november10/byrne/11byrne.html (accessed feb. 17, 2011). 46. daniel sui, “tobler’s first law of geography: a big idea for a small world?” annals of the association of american geographers 94, no. 2 (2004): 269–77. 47. woese, “a new biology for a new century.” science 149, no. 3683 (july 30, 1965): 510–15. 26. m. e. j. newman, “the first-mover advantage in scientific publication,” epl (europhysics letters) 86, no. 6 (2009): 68001. 27. bollen et al., “clickstream data yields high-resolution maps of science.” 28. chaomei chen, jasna kuljis, and ray j. paul, “visualizing latent domain knowledge,” ieee transactions on systems, man and cybernetics, part c (applications and reviews) 31, no. 4 (nov. 2001): 518–29. 29. hugh g. lewis et al., “a new analysis of debris mitigation and removal using networks,” acta astronautica 66, no. 1–2 (2010): 257–68. 30. dezso and barabási, “halting viruses in scale-free networks.” 31. oclc, “viaf (the virtual international authority file) [oclc—activities],” http://www.oclc.org/research/activities/viaf/ (accessed feb. 17, 2011). 32. mitchell, complexity: a guided tour. 33. james f. allen, “toward a general theory of action and time,” artificial intelligence 23, no. 2 (1984): 123–54. 34. herbert van de sompel et al., “memento: timemap apo for web archives,” http://www.mementoweb.org/events/ ia201002/slides/memento_201002_timemap.pdf (accessed feb. 17, 2011). 35. hawoong jeong et al., “lethality and centrality in protein networks,” nature 411 (may 3, 2001): 41–42. 36. aaron clauset, cristopher moore, and m. e. j. newman, “hierarchical structure and the prediction of missing links in networks,” nature 453, no. 7191 (2008): 98–101. 37. m. e. j. newman, “the structure of scientific collaboration privacy audit of public access computers and networks at a public college library article privacy audit of public access computers and networks at a public college library katelyn angell information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16233 katelyn angell (katelyn.angell@gmail.com) is medical librarian at the city university of new york (cuny) school of medicine. © 2023. abstract in 2021, the assessment-data management librarian at lehman college library decided to conduct a privacy audit of the library’s public computers and networks. this audit comprised one of the library’s two annual formal assessments of resources and services. the american library association’s (ala) library privacy checklist for public access computers and networks was selected to review 17 key items related to protecting user privacy and confidentiality. faculty and staff from circulation, library technology, and online learning identified 10 indicators needing work. suggestions are provided for collaboratively resolving these issues and future steps are described to continuously maximize the online security of the campus community. introduction lehman college library has a longstanding deep commitment to safeguarding the personal information of the college’s students, faculty, and staff. the library maintains an extensive collection of research guides offering resources for its librarians on all aspects related to patron privacy and confidentiality. library units frequently collaborate to ensure optimal adherence to professional best practices. in 2021, the assessment-data management librarian resolved to conduct a privacy audit of the library’s public access computers and networks during the 2021–2022 academic year. the audit served as one of the library’s two annual assessment projects. as one of the college’s administrative, educational, and student support (aes) units, the library selects and evaluates two resources or services every year, concluding with a detailed report. the reports are submitted to the office of assessment and are included within an extensive institutional effectiveness assessment report. privacy audits are “procedures to ensure that your organization’s goals and promises of privacy and confidentiality are supported by its practices. as a result, they protect confidential information from abuse and the organization from liability and public relations problems.”1 they are an important assessment procedure that should be undertaken periodically to help minimize the collection, retention, and dissemination of data—sometimes sensitive in nature—associated with library users. the american library association (ala) further explains the significance of upholding these values: “in their provision of services to library users, librarians have an ethical obligation, expressed in the code of ethics of the american library association and the library bill of rights, to preserve users’ right to privacy and to prevent any unauthorized use of user data.” 2 despite this importance, there are challenges facing librarians that confound procedures and protocols to optimize privacy protection. these include complex technologies and laws, lack of employee time to dedicate to these tasks, and a lack of advocacy or complaints from users.3 information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 2 angell public access networks and computers were selected for this audit due to their essentiality in safeguarding the information-seeking needs and practices of the lehman college community while on campus. this was a particularly timely project due to the fact that students were returning to campus in droves following covid-19 related closures and had a strong need for public computer access. critical aspects of computer privacy identified by ala are access to privacy policies, erasure of browser activity and personal data, protection from malware, and prevention of monitoring and tracking. literature review privacy audits can be conducted on a wide range of resources and services within libraries. these areas include electronic resources vendors, integrated library systems, public networks and computers, assistive technology, and library websites. magi urged fellow librarians to conduct privacy and confidentiality audits to guard the public trust as far back as 2006, specifying 12 potential areas.4 these include records of websites visited and emails sent and received on library computers. ala offers extensive checklists dedicated to assisting libraries and vendors in conforming to the library privacy guidelines. these documents are products of ala’s intellectual freedom committee. privacy audits are undertaken by academic libraries,5 law libraries,6 school libraries,7 and public libraries.8 despite toolkits on conducting privacy audits from ala and organizations like the library freedom project, there is a major lack of scholarship on these compliance procedures within library and information science literature.9 public libraries provide the most freely available documentation, although most of this information is presented on a library website and not written up as scholarly articles. a prime example is san jose public library, which offers a detailed account of a privacy audit across departments, including access services, technical services, marketing and communications, and security.10 choosing not to conduct privacy audits can jeopardize the anonymity and security of library users. this concern is increasing in an era heavily impacted by the proliferation of big data and its ability to impose unprecedented surveillance. marden warns of library users’ information nonconsensually being bundled up and applied to “trend analyses, grant funding, and reporting to local governments.”11 library users may not even be aware of these risks when sitting down at a computer to research, browse, or write, making it even more important for these security risks to be mitigated. in the days following the passage of the usa patriot act, which gave law enforcement agencies new powers to monitor patrons and obtain circulation records, coyle suggested every library designate a “privacy officer.”12 this designee would stay abreast of privacy issues impacting the library and would oversee leading privacy audits and maintaining privacy policies. a privacy audit is also useful because it plays an instrumental role in a library developing or revising a privacy policy.13 it can reveal strengths and weaknesses in library policies and procedures and provide an opportunity for the staff to collectively devise more robust and current privacy protections. currently, many academic and public libraries openly display patron data privacy policies on their websites. a particularly thorough and well-enumerated policy is that of mit libraries. within the policy is the statement that the document is periodically reviewed by the libraries and campus audit division.14 a privacy policy is just one way that librarians can help teach users how to take steps to best protect their data, especially on computers in public settings and/or with unsecured wireless information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 3 angell connections. key tenets of privacy literacy can be incorporated into critical thinking skill-building foregrounded in existing information literacy instruction sessions.15 these skills can include online image management, achieved in part by carefully evaluating information prior to posting on social media sites. currently, there is very little scholarship exploring the application of the ala privacy guidelines and checklists within library and information science literature. the author hopes that this article can play a role in helping to increase the proliferation of privacy audits at academic libraries and , therefore, avoid online threats including identity theft, tracking/spying, and phishing. methods this study was conducted at lehman college, a medium-sized urban college (about 15,000 students) in the bronx, new york. lehman is one of the 25 colleges within the city university of new york (cuny) system. the library does not have its own designated patron privacy policy; we default to a systemwide privacy policy.16 ala’s library privacy checklist for public access computers and networks was used as the evaluation tool in this audit.17 the checklist was employed in conjunction with ala’s library privacy guidelines for public access computers and networks to determine the level of personally identifiable information and data recorded by the library’s computers and devices. 18 these particular assessment resources were selected due to their development by the united states’ premier library professional organization. the instruments are still relatively new, approved by ala in 2017. to allow for streamlined evaluation of the 17 checklist items by multiple library employees, the assessment-data management librarian followed the lead of san jose public library and transferred the information from the ala website to a shared google document. columns were added next to each checklist item for status (needs work, accomplished, n/a), department (it, access, technical services), and a notes section for key comments related to past, present, and future planning and/or implementation. the checklist was sent to the library technology coordinator, the web services-online learning librarian, and the head of access services. they independently reviewed the checklist and evaluated the privacy items falling within their professional duties. the tool was also shared with the business librarian, a library privacy specialist, for her expert review. an image of the completed checklist is presented in figure 1. results analysis of the 17 items by library employees reveals that while the library is in full compliance with some of the guidelines devised by ala, there remain steps to be taken that can enhance privacy and confidentiality of patron information transactions on our computers and network. these six fully accomplished items (identified in fig. 1) will not be further discussed in this report, as the library has already achieved these goals. one other item will not be evaluated (“configure any content filters to not collect or share browsing data”), as it has been determined to not be relevant to the library’s purposes. ten items flagged as needing work will be focused on for the remainder of the report, accompanied by evaluator notes and potential future steps. information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 4 angell table 1. ala privacy checklist for public access computers and networks applied to the lehman college library ala checklist item status department use analog signage and/or splash screens to explain the library’s network and wi-fi access policies, including any privacy-related information. needs work library technology make a policy decision about the level of privacy versus convenience that the library will offer its wi-fi users and adequately warn users of potentials for traffic interception and other risks of an insecure network. needs work administration (determined by author) set up public computers to purge downloads, saved files, browsing history, and other data from individual user sessions. needs work library technology ensure that paper sign-up sheets for public computers, devices, or classes are destroyed when no longer needed. accomplished access services offer classes and other educational materials to users about best practices for privacy and security when using the library’s public computers. needs work reference and instruction (determined by author) offer privacy screens to patrons who desire to use them. needs work access services use antivirus software on all public computers. ensure that antivirus software that is installed has the ability to block spyware and keylogging software. accomplished library technology ensure that any computer reservation management system records, print management records, or ils records in regards to computer use are anonymized or destroyed when no longer needed. needs work access services configure any content filters to not collect or store browsing data. n/a library technology anonymize or destroy transactional logs for network activity when no longer needed. needs work library technology perform regular security audits on all public computers, including digital inspection of security risks and flaws and physical inspection for unknown devices. needs work library technology install plugins on public computers to limit third-party tracking, enable private browsing modes, and force https connections. accomplished library technology install the tor browser on public computers as a privacy option for patrons. needs work library technology offer the privacy-oriented tails os on bootable usb or cdrom for use on public computers or patron devices. needs work library technology install malware-blocking, ad blocking, and anti-spam features on firewalls. accomplished library technology segment the network to isolate staff computers, public computers, and wireless users into their own subnets. accomplished library technology ensure that any applications and operating systems on public computers are disabled from automatically sharing activity data with software publishers (e.g., error reporting). accomplished library technology information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 5 angell discussion after careful study of numerous library privacy websites and reviews of existing literature on privacy within library science literature, combined with the useful appraisal of the three evaluators, the assessment-data management librarian devised recommendations for the increased safeguarding of privacy and confidentiality for library patrons. these recommendations are provided in tandem with the recognition that library staff already provide the campus community with ample privacy protections. it is important to acknowledge that ala’s guidelines function within the understanding that not every library can accomplish every item or priority on the checklist due to factors related to technical expertise, resources/funding, and organizational structure. the 10 priorities noted by the evaluators as needing work are listed in table 2, accompanied by a perceived barrier (when relevant) and recommendation for a potential solution. the generation of the barriers and recommendations was and will remain a collective effort across library units. additional recommendations for the priorities were solicited for this project from library and information sciences scholarship. table 2. recommendations for accomplishing outstanding tasks related to privacy within public computers and networks. item needing work perceived barrier recommendation use analog signage and/or splash screens to explain the library’s network and wi-fi access policies, including any privacy-related information. only campus it, not library, can presently update analog signage in building. campus it and access services can collaborate to facilitate access. make a policy decision about the level of privacy versus convenience that the library will offer its wi-fi users. decision is not articulated as an official policy. co-author a library-level policy balancing privacy level and convenience. post on the library website and near public computers. hennepin library patron data privacy policy provides a strong template. set up public computers to purge downloads, saved files, browsing history, and other data from individual user sessions. library technology staff time (will have to reconfigure the image used on all computers to accomplish this goal). users who accidentally save to a computer’s desktop or download files and return later to find their files and history wiped. a script can be created and loaded on desktop which when chosen will delete content from locations specified within the script. programming computers to auto wipe/reset overnight instead of wiping between user sessions. offer classes and other educational materials to users about best practices for privacy and security when using the library’s public computers. lack of personnel time and resources. add key external information on these topics to research guides and share with patrons during library instruction classes and reference transactions. https://www.hclib.org/about/policies/patron-data-privacy https://www.hclib.org/about/policies/patron-data-privacy information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 6 angell item needing work perceived barrier recommendation offer privacy screens to patrons who desire to use them. uncommon request. purchase 1–2 screens and store; do not offer but provide upon direct request by patron. ensure computer reservation management system records, print management records, or ils records in regard to computer use are anonymized or destroyed when no longer needed. libcal reservations are required for students to reserve group study rooms (at time of evaluation, individual reservations were required to visit library due to covid-19). keep user information two weeks after libcal reservation is made for covid-19 tracing purposes, but purge contact information after two weeks.* perform regular security audits on all public computers, including digital inspection of security risks and flaws and physical inspection for unknown devices. time/resources. campus it networking group can work on accomplishing this objective. anonymize or destroy transactional logs for network activity when no longer needed. this priority is n/a regarding virtual desktop implementation for units in the education library. desktops and macs across library building need work. campus it networking group can work on accomplishing this objective. install tor browser on public computers as privacy option for patrons. browser hasn’t been previously installed. library technology staff note they can install browser onto public computers. encourage students to use duckduckgo as their browser in lieu of google, as it is committed to protecting online privacy. offer the privacy-oriented tails os on bootable usb or cd-rom for use on public computers or patron devices. booting off a usb drive is an option but requires elevated rights. adding a password in bios will prevent users from using bootable usbs or cd-roms. *august 2022 update: library visit booking information for last year was purged when the circulation unit made changes to visit criteria 1 hour/2 hour/day pass. the ability to accomplish the aforementioned tasks is contingent on continued close collaboration between library employees. while conducting the audit and writing this report is an important step in maintaining data privacy, it remains a work in progress. the assessment-data management librarian recommends formation of a task force to focus on achieving as many of these tasks deemed realistic given existing financial and personnel circumstances. duke libraries describe development of a data privacy task force.19 its main duties are to “review the audit report and gather any additional data necessary to inform their work, which may include setting priorities, working with departments and units to create policies where they are lacking, making recommendations for how to communicate policies to patrons, and other tasks determined by the task force.” the library’s task force could meet on a semester basis to reflect on information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 7 angell the status of data privacy protections and reassess if interventions are functioning efficiently or require modification. further research reveals additional steps taken by external libraries to enhance online privacy protocols. robinson began using the free certificate program let’s encrypt for library servers and services at the university of alaska.20 he provides instructions for employing the program to an api server. in addition, an extensive privacy audit at san jose public library resulted in a detailed action plan.21 relevant items included developing a library data breach policy and installing research tracking plugins like privacy badger on public computers to prevent collection of user data by unwanted third parties. ala’s library privacy policies report details an analysis of over 100 american academic and public library privacy policies.22 this document offers extensive information on how a wide selection of libraries attempt to best protect patron privacy regarding data collection, third-party platforms, data security, and data retention. the report lists specific examples of text culled directly from library privacy policies, some of which pertain directly to material on ala checklists and can be used to better safeguard privacy and confidentiality at this library. for example, rutgers university libraries assures users that they “remove cookies, web history, cached files, or other computer and internet use records and other software code that is placed on our public computers or networks after each use.” this author maintains that all policies should include a recommendation to use duckduckgo for web browsing, as well as the posting of signs with this recommendation in public computer areas. in general, duckduckgo is as efficient in resolving information queries as google and does not store or track search history.23 the lack of targeted ads on duckduck go, for example, shows the level of privacy that can be maintained while using the browser. lastly, the assessment-data management librarian suggests sharing this report with the university library system’s privacy roundtable. the roundtable could review these findings and contribute recommendations from experiences at their own libraries, thus potentially improving data security and confidentiality for both the college community and other campuses within the university system. this is particularly important as students, faculty, and staff in the system are free to use libraries at any of the campuses (except for the law school). conclusion there are a few limits of this privacy audit worth mentioning. first, the assessment was conducted during the covid-19 pandemic within a hybrid remote/in-person working and learning environment. employees and patrons alike were on campus part time, leading to far less computer and network usage than in previous years. a related limitation is that evaluation with the ala checklist was conducted by each library participant separately over email, not as a group working in tandem in real time. a future study could benefit from the audit occurring in person, with each individual able to observe and assess existing conditions together in relevant library spaces. another limit is that this project was spearheaded by the assessment-data management librarian, who does not possess a professional background in information technology and library systems. while this librarian had ample support and assistance from colleagues during the audit and writing of this report, her own direct knowledge of public computing and networks can be information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 8 angell considered a weakness. this limit reiterates the necessity of interdepartmental collaboration on running privacy audits and successfully applying fixes identified during the process. to capably implement the priorities determined as needing work, it would be optimal to create a combined library data privacy task force. the most critical aspect of an audit, as argued by matz, is that it is “only an initial assessment, because a privacy audit should be an ongoing process for the library and its staff.”24 encroachment on online privacy by not just scammers or other malicious parties, but companies we often know and trust, does not require a one-time fix. the latter has been reported to include facebook, zoom, whatsapp, and google.25 an analysis of public computers and networks is only one of many types of privacy audits that libraries can conduct. ala has done an excellent service for the library profession by devising and sharing their library privacy checklists. future studies within academic library scholarship can explore audits of different key aspects of information security, such as assistive technology, vendors, and integrated library systems. by conducting these studies and sharing our findings with the greater community, we as a profession can collectively greater protect user data. acknowledgements i would like to thank the following lehman college colleagues for their assistance and support throughout this project: john delooper, raymond diaz, martha lerski, and stephen walker. endnotes 1 american library association, “privacy audits,” october 2021, https://www.ala.org/advocacy/privacy/audits#:~:text=privacy%20audits%20are%20proced ures%20to,liability%20and%20public%20relations%20problems. 2 american library association, “library privacy guidelines for vendors,” january 2020, https://www.ala.org/advocacy/privacy/guidelines/vendors. 3 tucker taylor, “library privacy 101,” november 2018, https://www.scla.org/assets/docs/2018_conference/library%20privacy%20101.pptx. 4 trina j. magi, “protecting library patron confidentiality: checklist of best practices,” illinois library association (fall 2006), https://www.ila.org/advocacy/making-yourcase/privacy/confidentiality-best-practices. 5 margaret heller, “creating a privacy policy from the ground up,” acrl tech connect (february 2018), https://acrl.ala.org/techconnect/post/creating-a-privacy-policy-from-the-ground-up/; patrick o’brien et al., “protecting privacy on the web: a study of https and google analytics implementation in academic library websites,” online information review 42, no. 6 (2018): 734–51, https://doi.org/10.1108/oir-02-2018-0056. 6 rachel gordon, “privacy audits in the law library,” july 2014, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2461235. 7 donna riehl, “students’ privacy rights in school libraries: balancing principles, ethics and practices,” school libraries in canada 26, no. 2 (2006), http://accessola2.com/slicsite/slic/262studentsprivacyrights.html. https://www.ala.org/advocacy/privacy/audits#:~:text=privacy%20audits%20are%20procedures%20to,liability%20and%20public%20relations%20problems https://www.ala.org/advocacy/privacy/audits#:~:text=privacy%20audits%20are%20procedures%20to,liability%20and%20public%20relations%20problems https://www.ala.org/advocacy/privacy/guidelines/vendors https://www.scla.org/assets/docs/2018_conference/library%20privacy%20101.pptx https://www.ila.org/advocacy/making-your-case/privacy/confidentiality-best-practices https://www.ila.org/advocacy/making-your-case/privacy/confidentiality-best-practices https://acrl.ala.org/techconnect/post/creating-a-privacy-policy-from-the-ground-up/ https://www.emerald.com/insight/publication/issn/1468-4527 https://www.emerald.com/insight/publication/issn/1468-4527 https://doi.org/10.1108/oir-02-2018-0056 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2461235 http://accessola2.com/slic-site/slic/262studentsprivacyrights.html http://accessola2.com/slic-site/slic/262studentsprivacyrights.html information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 9 angell 8 san francisco public library (sfpl), “sfpl data privacy audit,” accessed february 2, 2023, https://sfpl.org/about-us/sfpl-data-privacy-audit; erin berman and julie oborny, “a practical guide to privacy audits,” youtube video, 2018, 55:55, https://www.youtube.com/watch?v=aq5upxsskok. 9 library freedom project, “privacy toolkit for librarians,” may 2017, https://libraryfreedom.org/privacy-toolkit-for-librarians/. 10 san jose public library (sjpl), “privacy audit,” accessed february 1, 2023, https://www.sjpl.org/privacy/privacy-audit. 11 william marden, “choose privacy week 2018: big data is watching you,” journal of intellectual freedom & privacy 4, no. 1 (2019): 3–4, https://doi.org/10.5860/jifp.v4i1.6885. 12 karen coyle, “make sure you are privacy literate,” library journal 127, no. 16 (october 2002): 55–57, http://www.kcoyle.net/privacy_lj2.html. 13 shandra morehouse et al., “creating a library privacy policy by focusing on patron interactions,” in sustainable digital communities, eds. anneli sundqvist et al. (cham: springer, 2020). 14 mit libraries, “mit libraries patron data privacy policy,” november 2020, https://libraries.mit.edu/about/policies/privacy-policy/. 15 christina l. wissinger, “privacy literacy: from theory to practice,” communications in information literacy 11, no. 2 (2017): 378–89, https://files.eric.ed.gov/fulltext/ej1166461.pdf. 16 cuny libraries, “cuny libraries’ privacy statement,” january 2019, https://www.cuny.edu/about/administration/offices/library-services/policies/patronprivacy/. 17 american library association, “library privacy checklist for public access computers and networks,” january 2020, https://www.ala.org/advocacy/privacy/checklists/public-accesscomputer. 18 american library association, “library privacy guidelines for public access computers and networks,” january 2020, https://www.ala.org/advocacy/privacy/guidelines/public-accesscomputer. 19 joyce chapman and angela zoss, “duke libraries data privacy and retention audit report,” january 2020, https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/20061/dul%20data%20p rivacy%20and%20retention%20audit%202020%20-%20public.pdf?sequence=1&isallowe d=y. 20 mike robinson, “let’s encrypt on an api server,” choose privacy everyday, 2018, https://chooseprivacyeveryday.org/resources/https-lets-encrypt/recipe-for-lets-encrypt-onan-api-server/. https://sfpl.org/about-us/sfpl-data-privacy-audit https://www.youtube.com/watch?v=aq5upxsskok https://libraryfreedom.org/privacy-toolkit-for-librarians/ https://www.sjpl.org/privacy/privacy-audit https://doi.org/10.5860/jifp.v4i1.6885 http://www.kcoyle.net/privacy_lj2.html https://libraries.mit.edu/about/policies/privacy-policy/ https://files.eric.ed.gov/fulltext/ej1166461.pdf https://www.cuny.edu/about/administration/offices/library-services/policies/patron-privacy/ https://www.cuny.edu/about/administration/offices/library-services/policies/patron-privacy/ https://www.ala.org/advocacy/privacy/checklists/public-access-computer https://www.ala.org/advocacy/privacy/checklists/public-access-computer https://www.ala.org/advocacy/privacy/guidelines/public-access-computer https://www.ala.org/advocacy/privacy/guidelines/public-access-computer https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/20061/dul%20data%20privacy%20and%20retention%20audit%202020%20-%20public.pdf?sequence=1&isallowed=y https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/20061/dul%20data%20privacy%20and%20retention%20audit%202020%20-%20public.pdf?sequence=1&isallowed=y https://dukespace.lib.duke.edu/dspace/bitstream/handle/10161/20061/dul%20data%20privacy%20and%20retention%20audit%202020%20-%20public.pdf?sequence=1&isallowed=y https://chooseprivacyeveryday.org/resources/https-lets-encrypt/recipe-for-lets-encrypt-on-an-api-server/ https://chooseprivacyeveryday.org/resources/https-lets-encrypt/recipe-for-lets-encrypt-on-an-api-server/ information technology and libraries september 2023 privacy audit of public access computers and networks at a public college library 10 angell 21 san jose public library (sjpl), “privacy audit.” 22 jason vaughan, “library privacy policies,” library technology reports 56, no. 6 (2020): 1–53, https://journals.ala.org/index.php/ltr/issue/viewfile/771/537. 23 james temperton, “i ditched google for duckduckgo. here’s why you should too,” wired (november 2019), https://www.wired.co.uk/article/duckduckgo-google-alternative-searchprivacy. 24 chris matz, “libraries and the usa patriot act: values in conflict,” journal of library administration 47, no. 3/4 (2008): 69–87, https://doi.org/10.1080/01930820802186399. 25 kayla matthews, “6 examples of online privacy violation,” cybernews (september 2021), https://cybernews.com/privacy/6-examples-of-online-privacy-violation/. https://journals.ala.org/index.php/ltr/issue/viewfile/771/537 https://www.wired.co.uk/article/duckduckgo-google-alternative-search-privacy https://www.wired.co.uk/article/duckduckgo-google-alternative-search-privacy https://doi.org/10.1080/01930820802186399 https://cybernews.com/privacy/6-examples-of-online-privacy-violation/ abstract introduction literature review methods results discussion conclusion acknowledgements endnotes 162 information technology and libraries | december 2010 within that goal are two strategies that lend themselves to the topics including playing a role with the office for information technology policy (oitp) with regard to technology related public policy and actively participating in the creation and adoption of international standards within the library community. colby riggs (university of california–irvine) represents lita on the office for information technology policy advisory committee. she also serves on the lita technology access committee, which addresses similar issues. the committee is chaired by elena m. soltau (nova southeastern university). the standards interest group is chaired by anne liebst (linda hall library of science, engineering, and technology). yan han (university of arizona) chairs the standards task force, which was charged to explore and recommend strategies and initiatives lita can implement to become more active in the creation and adoption of new technology standards that align with the library community. the task force will submit their final report before the 2011 ala midwinter meeting. for ongoing information about lita committees, interest groups, task forces, and activities being implemented on these and related topics, be sure to check out ala connect (http://connect.ala.org/) and the lita website (http://www.lita.org). the lita electronic discussion list is there to pose questions you might have. lita members have an opportunity to advocate and participate in a leadership role as the broadband initiative sets the infrastructure for the next ten to fifteen years. lita members are encouraged to pursue these opportunities to ensure a place at the table for lita, its members, and libraries. b y now, most lita members have likely heard about the broadband technology opportunities program (btop) and the national broadband plan. the federal government is allocating grants to the states to develop their broadband infrastructure, and libraries are receiving funding to implement and expand computing in their local facilities. by september 30, 2010, the national telecommunications and information administration (ntia) will have made all btop awards. information about these initiatives can be found at www2.ntia.doc.gov (btop), www.broadband.gov (national broadband plan), and www.ala.org/ala/aboutala/offices/oitp/index.cfm (ala office for information technology policy). on september 21, 2010, a public forum was held in silicon valley to discuss e-rate modernization and innovation in education. the conversation addressed the need to prepare schools and public libraries for broadband. information about the forum is archived at blog .broadband.gov. established in 1996, the e-rate program has provided funding for k–12 schools and public libraries for telecommunications and internet access. the program was successful in a dial-up world. it is time to now address broadband access which is not ubiquitous on a national basis. while the social norm suggests that technology is everywhere and everyone has the skills to use it, there is still plenty of work left to do to ensure that people can use technology and compete in an increasingly digital and global world. how does lita participate? the new strategic plan includes an advocacy and policy goal that calls for lita to advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: btop, broadband, e-rate, and lita animated subject maps for book collections tim donahue information technology and libraries | june 2013 7 abstract of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. the development of opacs and the abandonment of card catalogs in the 1980s and 1990s is the seminal evolution in print monograph access, but little else has changed. to help users locate books by call number and browse the collection by subject, animated subject maps were created. while the initial aim is a practical one, helping users to locate books and subjects, the subject maps also reveal the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is presented as an example of how we can do more in this regard. lc classification, books, and library stacks during the last few decades of technological evolution in libraries, we have helped facilitate a seismic shift from print-based to digital research. our library websites are jammed with electronic resources, digital collection components, database links, virtual reference assistance, online tutorials, and mobile apps. collection budgets too have shifted from a print to electronic focus. many libraries are now spending less than 20 percent of their material budgets on print monographs. and yet, our stacks are still filled with books that often take up more than fifty percent of our library spaces. knowledge organization schemas have also evolved in libraries. we have subject lists to help users to decide on which databases to select that reflect current disciplines and majors in higher education. internal database navigation continues to evolve in terms of limits, fields, and subject searching. web searching is based on the contemporary keyword approach where “everything is miscellaneous” and need not be organized, but nationwide, billions of books still sit on shelves according to dewey or library of congress classification systems that were initially developed over a century ago. some say these organizing systems are woefully antiquated and do not reflect our contemporary post-modern realities, though they still amply serve their purpose to assign call number locations for our books. we hear scant little of plans to update these classification schemes. why invest more time, energy, and resources on revamped organization schemes for libraries? the hathitrust now contains the tim donahue (tdonahue@montana.edu) is assistant professor/instruction librarian, montana state university, bozeman, mt. animated subject maps for book collections | donahue 8 scanned text of more than ten million books. google claims there are almost 130 million published titles in the world and intends to digitize all of them.1 what will happen to our physical book collections? how long will they reside on our library shelves? how long will they be located using the dewey and lc systems? is the library a shrinking organism? profession-wide, there seems to be no concrete vision in regards to the future of our book collections. there is, of course, general acknowledgement that acquisition of e-books will increase as print acquisitions decrease and that, overall, print collections will accordingly shrink to reflect the growing digital nature of knowledge consumption. but for now and into the foreseeable future these billions of monographs remain on our shelves in the same locations our call number systems assigned to them decades ago. and while online library users are now able to utilize an array of electronic access delivery systems and web technologies for their article research and consumption, book seekers still need a call number. books and articles have been our two primary textual formats for centuries. articles have moved into the digital realm more fleetly than their lengthier counterparts. their briefer length, the cyclical serial publication process, and the evolution of database containment and access have enabled, in a relatively short time, a migration from print to primarily digital access. books, however, are accessed in much the same way they were a hundred years ago. the development of opacs in the 1980s and 1990s and abandonment of card catalogs is the seminal evolution in print monograph access, but little else has changed.2 once a call number is attained, the rest of the process remains physical, usually requiring pencil, paper, feet, sometimes a librarian, and a trip through the library until the object itself is found and pulled from the shelf. so while the process of article acquisition may employ a plethora of finding aids, keyword searching, database features, full text availability, and various delivery methods through our richly developed websites, beyond the opac and possibly a static online map, book seekers are on their own or need a librarian in what may seem a meaningless labyrinth of stacks and shelves. while the primary and most practical purpose of our classification schemes is to provide an assigned call number for book finding, these organizational outlines create an order to the layout of our stacks that maps a universe of knowledge within our library walls. this structure of knowledge reveals a meaning to our collections that includes the colocation of books by topic and proximity of related subjects. these features enhance the browsing process and often lead to the act of serendipitous discovery. to locate a book by call number, a user may consult library floor plans, which are typically limited to broad ranges or lc main classes, then rely on stack-end cards to home in on the exact stack location. to browse books by subject without using the catalog, a user typically must rely on a combination of floor plans and lc outline posters if they exist at all. often, informed browsing by subject cannot take place without a visit to the reference desk for mediation by a librarian. even then, many librarians are barely familiar with their book collection’s organizational structure and are reticent to recommend broad subject browsing. information technology and libraries | june 2013 9 purpose and description of the subject map to help users locate books by call number and browse the collection by subject, animated subject maps were created at skidmore college and montana state university. displaying overhead views of library floors, users mouse over stacks to reveal the lc sub-classes located within. alternatively, they may browse and select lc subject headings to see which stacks contain them. the lc outline contains 21 main subject classes and 224 sub-classes, corresponding to the first two elements of a book call number. on stack mouse-over, three items are displayed: the call number by range, the main subject heading, and all sub-classes contained within the stack. when using the browse by subject option, users select and click an lc main class and the stacks where this class is located are highlighted. while the initial aim is a practical one, helping users to locate books and subjects, the subject map also reveals the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. the map also provides local electronic access to the lc classification outline. at both institutions the maps are linked from prominent web locations and electronic points of need that are relevant and proximate to other book searching functions and tools. figure 1. skidmore college subject map showing stack mouse-over display. animated subject maps for book collections | donahue 10 figure 2. montana state university subject map showing stack mouse-over display. design rationale and methodology the inspiration for the subject map started with a question: what if users could see on a map where individual subjects were located within the library? most library maps examined were limited to lc main classes or broad ranges denoting wide swaths of call numbers. including hundreds of lc subclasses would convolute and clutter a floor map beyond usability. but what if an online map contained each individual stack and only upon user-activation was the information revealed, saving space and avoiding clutter? such a map should be as devoid of congestion as possible and focus the user’s attention on library stack locations and lc classification. working from existing maps and architectural blueprints of the library building, a basic perimeter was rendered using adobe illustrator and indesign software. these perimeters were then imported into adobe flash and a new .fla file created. library stacks were then measured, counted, and added as a separate layer within each floor perimeter. basic location elements such as stairways, elevators, and doors were added for locational reference points. each stack was then programmed as a button with basic rollover functionality. flash actionscript was coded so that the correct call number, main class, and sub-class information appear within the interface upon rollover activation. this functionality accounts for the stack searching ability of the subject map. information technology and libraries | june 2013 11 additionally, the lc outline was made searchable within the map so that users can mouse over subjects and upon clicking, see what stacks contain those main classes. this functionality accounts for the subject searching ability of the map. left-hand navigation was built in so users can toggle between these two main search functions. maintaining visual minimalism and simplicity was a priority and inclinations to render the map more comprehensively were resisted in order to maximize attention to subject and stack information. black, white, and gray colors were chosen to enhance the contrast of the map and aid the user’s eye for quick and clear use. other relevant links and instructional context were added to the left-hand navigation including links to the catalog, official lc outline, and library homepage. finally, after uploading to the local server and creating a simple url, links to the subject map were established in prominent and meaningful points of need within the library website. user acceptance once the subject map was completed and links to it were made public, a brief demonstration was provided for reference team members who began showing it to users at the reference desk. initial reaction was enthusiastic. students thought it was “cool” and enjoyed “playing with it.” one reported, “i didn’t know the library actually made sense like that. it’s neat to see the logic about where things are.” another student said, “now i can see where all the books on buddhism are!” faculty, too, were pleased. though faculty members typically know a little about lc classification, they are not accustomed to seeing it visualized and grafted onto their institutional library’s stacks. making transparent the intellectual organization of the library for other faculty can bolster their confidence in our order and structure. professors are often pleased to see their discipline’s place within our stacks and where related subjects are located. the most positive praise for the subject map, however, comes from the sense of convenience it lends. many comments express appreciation for the ability to directly locate an individual book stack. because primary directional and finding elements like stairs and elevators are included in the maps, users are able to see the exact path that leads to the book they are seeking. for those not interested in browsing, in a hurry, or challenged in terms of mobility, the subject map is a time and energy saver. some users however have reported frustration with the sensitivity required for the mouse-over functions. others desire a more detailed level of searching beyond the sub-class level. one user pointed out that the subject map was of no help to the blind. multiple uses and internal applications the primary use and most obvious application of the subject map is as a reference tool. as a front line finding aid, librarians and other public service staff at reference, circulation, or other help desks can easily and conveniently deploy the map to point users in the right direction and orient them to the book collection. in library instruction sessions, the subject map is not only a practical local resource worth pointing out, but also serves as an example of applied knowledge organization. when accompanying a demonstration of the library catalog, the map is not only a valuable finding aid, but adds a layer of meaning as well. students who understand the map are animated subject maps for book collections | donahue 12 not only more able to browse and locate books, but learn that a call number represents a detailed subject meaning as well as locational device. used in conjunction with a tour, the map reinforces the layout of library shelves and helps to bridge the divide between electronic resources and physical retrieval. the subject map facilitates a concrete and visual introduction to the lc classification outline, a knowledge of which can be applied to most college and research libraries in the united states. the subject map can also be of assistance with collection development. perusal of the map can reveal relative strengths and weaknesses within the collection. subject liaisons and bibliographers may use the map to home in on and visualize their assigned areas. circulation staff and stacks maintenance workers find the map useful for book retrieval, shifting projects, and in the training and acclimation of new workers to the library. the subject map has proven to be a useful reference for library redesign and space planning considerations. at information fairs and promotional events where devices or projection screens are available, the map has served as a talking point and promotional piece of digital outreach. the map has been demonstrated by information science professors to lis graduate students as an example of applied knowledge organization in libraries. recently, a newly hired incoming library dean commented that the map helped him “get to know the book collection” and familiarized him with the library. figure 3. skidmore college subject map showing subject search display. information technology and libraries | june 2013 13 issues and challenges in some libraries, books don’t move for decades. the same subjects may reside on the same shelves during an entire library’s lifetime. in this case, a subject map can be designed once and never edited. but, of course, most library buildings go through changes and evolutions. in many libraries, collection shifting seems to be ongoing. book collections wax and wane. certain subjects expand with their times, while others shrink in irrelevancy. weeding does not affect all subjects and stacks equally and adjustments to shelves and end cards are necessary. in addition to the transitions of weeding and shifting, sometimes whole floors are reconfigured. in the library commons era of the last few decades, substantial redesigns have been commonplace as book collections make way for computer stations and study spaces. in all these cases, adjustments and updates will be necessary to keep a subject map accurate. this is easily done by going back into the master .fla file and editing as needed. in many cases only a stack or two need be adjusted, but in instances of major collection shifting some planning ahead may be necessary and more time allotted for redesign. shifting can be a complex spatial exercise and it is difficult to predict where subjects will realign exactly. subject map editing may have to wait until physical shifting is completed. it should be noted that each stack must be hand-coded separately. in libraries with hundreds of stacks this can seem a tedious and time-consuming design method. both subject maps rely on adobe flash animation technology. flash is proprietary software, so the benefits of open source software cannot be utilized with subject maps at this time. further, abobe flash reader software must be installed on a computer for the subject map to render. this has almost never been a problem, however, as the flash reader is ubiquitous and automatically installed on most public and private machines upon initial boot up. another concern, however, relating to flash technology is human assets. not every library has a flash designer or even someone who can implement the most fundamental flash capabilities. flash is not hard to learn and the subject maps utilize only its most basic functionalities, but still, for some it remains a niche software and many libraries will not have the resources to invest. reaction, though, to the live subject maps and the rollover interactivity they provide, has been so positive that more fully integrated flash maps have been proposed. why not have all physical elements of the library incorporated into one flash-enabled map? this is possible but may come at some expense to the functionality of the subject-rendering aspect of the maps. by limiting the application to stacks and lc classes, a user may remain more focused. avoiding clutter, overcrowding, and a preponderance of choice is a design strategy that has gained much credibility in recent years.3 the subject map enjoys the usability success of clean design, limited purpose, and simple rendering. while demonstrating the potential of user-activated animation for other proposed library applications, the subject map might be best maintained as a limited specialty map. a final concern regarding the long-term success of subject maps should be mentioned. how long will books remain in libraries? how long will they be organized by subject? when the physical animated subject maps for book collections | donahue 14 arrangement and organization of information objects no longer exists in libraries, maps of any kind will seemingly lose all efficacy. but will libraries themselves exist in this future? whither books? whither libraries? future developments the most prominent and practical attribute of the subject map is its ability to show a user the exact stack where the book they are seeking is located. but in its current state as a stand-alone application, a user must obtain a call number from a catalog search, then open the subject map by going to its independent url. investigation is underway to determine what is necessary in order to integrate the subject map with the online catalog. in this scenario, a catalog item record might also display an embedded subject map that automatically highlights the floor and stack where the call number is located. this seemingly requires .swf files and flash actionscript to be embedded in catalog coding. one potential solution is to attribute an individual url to each stack rendering so that a get url function can be applied and embedded in each catalog item record. this synthesis of subject map and catalog poses a complex challenge but promises meaningful and time-saving results for the item retrieval process. qr code technology in conjunction with subject map use is also being deployed. by fixing qr codes on stack end cards that link to relevant sections of the lc outline, a researcher may use a mobile device to browse digitally and physically within the stacks at the same time. in this way a user may conduct digital subject browsing and physical item browsing simultaneously. the urls linked to by qr coding contain detailed lc sub-levels not contained within the subject map, which is limited to the level of sub-class. the active discovery of new knowledge facilitated by exploiting preexisting lc organization inside library stacks in real time can be quite impressive when experienced firsthand. another development exploiting lc knowledge organization is in beta mode at this time. an lc search database has been created allowing users to enter words and find matching lc subject terminology. potentially, this database could be merged with the subject map, allowing users to correlate subject word search with physical locations independent of call numbers. despite its intent as a limited specialty map, possibilities are also being explored to incorporate the subject map into a more fully integrated library map. one way forward in this regard is to create map layers that could be toggled on and off by users. in this way, the subject map could exist as its own layer, maintaining its clarity and integrity when isolated but integrated when viewed with other layers. flash technology excels at allowing such layer creation. other stack maps and related technologies searching the web for “subject map” and relative terminology such as stack, shelf, book, and lc maps, does turn up various efforts and approaches to organizing and exploiting classification scheme data, but no animated, user-activated maps are found. similar searches across library and information science literature turn up some explorative research on the possibilities of mapping information technology and libraries | june 2013 15 lc data, but again no animated stack maps are found.4 there is a product licensed by bowker inc. called stackmap that can be linked to catalog search results. when a user clicks on the map link next to a call number result, a map is displayed with the destination stack highlighted, but the information provided is locational only. stackmap is not animated or user-activated. no subject information is given and the map offers no browsing features. since the release of html5, we are beginning to see more animation on the web that is not flashdriven. steve jobs and apple’s determined refusal to run flash on their mobile devices has motivated many to seek other animation options. new html5 animation tools such as adobe edge, hippo animator, and hype offer promising starts at dislodging the flash grip on web animation, but they have far to go and do not yet offer either the ease of design nor the range of creative possibilities of flash. building an animated subject map with html5 alone does not seem possible at this time. universal applicability of the subject map so far, subject maps have been created for two very different libraries. the commonality shared between the montana state university and skidmore college libraries is their possession of hundreds of thousands of books in stacks shelved by the lc classification system. this is a trait shared by nearly all college and research libraries. subject maps can be easily structured on the dewey decimal system as well so that public libraries could benefit from their functionality, making the subject map appropriate and creatable for more than 12,000 libraries.5 of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. article searching and retrieval continues to evolve through the rich implementation of assets such as locally constructed resource management tools, independent journal title searches, complexly designed database search interfaces, and dedicated electronic resource librarians. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. because its future is uncertain, does that justify our neglect in the present? as a profession we seem a bit complacent about the state of our book collections. why dedicate our technical resources to a format that is on the way out? but has the book disappeared yet? as we make room for more student lounges, coffee bars, computer stations, writing labs, and information commons, we should carefully ask what makes a library special. good books and the focused, sustained treatment of knowledge they contain are part of the correct answer, symbolically and as yet, practically speaking. while our books still occupy our library shelves, shouldn’t they also fully benefit from the ongoing technological explosion through which we continue to evolve? opacs haven’t evolved much in recent years. in fact they seem quite stymied to many librarians and users. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is hopefully an example of how we can do more in this regard. while we have grown accustomed to increasingly look forward in order to position our libraries for the future, we should also remember to sometimes look back. our classification systems and animated subject maps for book collections | donahue 16 book collections are assets built from the past that represent many decades of great labor, investment, and achievement. more than 12,000 public and academic libraries together make up one of our greatest national treasures and bulwarks of living democracy. libraries are among the dearest valued assets in any of our states. many of the most beautiful buildings in our nation are libraries. based on library insurance values and estimated replacement costs, library buildings and the collections they hold amount cumulatively to hundreds of billions of dollars of worth.6 this astounding worth is figured mainly from the buildings themselves and the books they contain. a few have commented that there is some aesthetic quality to the subject maps. if this is true, the appeal comes from the synthesis of architectural form and the universe of knowledge revealed within, from the beauty of libraries both real and ideal, from physical and mental constructions unified. animated subject maps can help bring the physical and intellectual beauty of libraries into the digital realm, but the main appeal is a practical one: to point the user directly to the book or subject they are seeking. so in conclusion, perhaps we should measure the subject map’s potential in the light of ranganathan’s five laws of library science:7 1. books are for use. 2. every reader his [or her] book. 3. every book its reader. 4. save the time of the reader. 5. the library is a growing organism. the subject maps can be found at the following urls: skidmore college subject map: http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf montana state university subject map: www.lib.montana.edu/subjectmap references 1. google, “google books library project—an enhanced card catalog of the world’s books,” http://books.google.com/googlebooks/library.html, accessed november 8, 2012. 2. antonella iacono, “opac, users, web. future developments for online library catalogues,” bollettino aib 50, no. 1–2 (2010): 69–88, http://bollettino.aib.it/article/view/5296. 3. geoffrey little, “where are you going, where have you been? the evolution of the academic library web site,” the journal of academic librarianship 38, no. 2, (2012): 123–25, doi:10.1016:j.acalib.2012.02.005. http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf http://www.lib.montana.edu/subjectmap/ http://books.google.com/googlebooks/library.html http://bollettino.aib.it/article/view/5296 http://dx.doi.org/10.1016:j.acalib.2012.02.005 information technology and libraries | june 2013 17 4. kwan yi and lois mai chan, “linking folksonomy to library of congress subject headings: an exploratory study,” journal of documentation 65, no. 6 (2009): 872–900, doi:10.1108:00220410910998906. 5. american library association, “number of libraries in the united states, ala library fact sheet 1,” www.ala.org/tools/libfactsheets/alalibraryfactsheet01. 6. edward marman, “a method for establishing a depreciated monetary value for print collections,” library administration and management 9, no. 2 (1995): 94–98. 7. s. r. ranganathan, the five laws of library science (new delhi: ess ess, 2006), http://hdl.handle.net/2027/mdp.39015073883822. http://dx.doi.org/10.1108:00220410910998906 http://www.ala.org/tools/libfactsheets/alalibraryfactsheet01 http://hdl.handle.net/2027/mdp.39015073883822 mets as an intermediary schema for a digital library of complex scientific multimedia richard gartner information technology and libraries | september 2012 24 abstract the use of the metadata encoding and transmission standard (mets) schema as a mechanism for delivering a digital library of complex scientific multimedia is examined as an alternative to the fedora content model (fcm). using mets as an “intermediary” schema, where it functions as a template that is populated with content metadata on the fly using extensible stylesheet language transformations (xslt), it is possible to replicate the flexibility of structure and granularity of fcm while avoiding its complexity and often substantial demands on developers. mets as an intermediary schema for a digital library of complex scientific multimedia of the many possible approaches to structuring complex data for delivery via the web, two divergent philosophies appear to predominate. one, exemplified by such standards as the metadata encoding and transmission standard (mets)1 or the digital item declaration language (didl),2 relies on the structured packaging of the multiple components of a complex object within “top-down” hierarchies. the second, of which the fedora content model (fcm) is perhaps a prime example,3 takes the opposite approach of disaggregating structural units into atomistic objects, which can then be recombined according to the requirements of a given application.4 neither is absolute in its approach—mets, for instance, allows cross-hierarchy linkages, and many fcm models are designed hierarchically—but the distinction is clear. many advantages are validly claimed for the fcm approach to structuring digital data objects. individual components, not constrained to hierarchies, may be readily reused in multiple representations with great flexibility.5 complex interobject relationships may be encoded using semantic linkages,6 a potentially much richer approach to expressing these than the structural relationships of xml can allow. multiple levels of granularity, from that of the collection as a whole down to its lowest-level components, can readily be modelled, allowing interobject relationships to be encoded as easily as intercomponent ones.7 such models, particularly the rdf-based fedora content model, are very powerful and flexible, but can often lead to complexity and consequently considerable demands on system development before they can be implemented. in addition, despite the theoretical interoperability offered by rdf, in practice the exchange and reuse of content models has proved somewhat limited because considerable work is usually required to re-create and validate a content model created elsewhere.8 this article examines whether it is possible to replicate the advantages of this approach to structuring data within the constraints of the more rigid mets standard. the data used for this analysis is a set of digital objects that result from biological nanoimaging experiments, the interrelationships of which present complex problems when they are delivered online. the richard gartner (richard.gartner@kcl.ac.uk) is a lecturer in library and information science, king’s college, london. mets as an intermediary schema for a digital library of scientific multimedia | gartner 25 method used is an unconventional use of a mets template as an intermediary schema;9 this allows something of the flexibility of the fcm approach while retaining the relative simplicity of the more structured mets model. a nanoimaging digital library and its metadata requirements the collection analysed for this study derives from biological nanoimaging experiments undertaken at the randall division of cell and molecular biophysics at king’s college london. biological nanoimaging is a relatively new field of research that aims to unravel the biological processes at work when molecules interact in living cells; this is done by using optical techniques that can resolve images down to the molecular level. it has particular value in the study of how diseases progress and has great potential to help predict the effects of drugs on the physiology of human organs. as part of the biophysical repositories in the lab (bril) project at king’s college london,10 a digital library is being produced to meet the needs of practitioners of live cell protein studies. although the material being made available here is highly specialised, and the user base is restricted to a specialist cohort of biologists, the challenges of this library are similar to those of any collection of digital objects: in particular, the metadata strategy employed must be able to handle the delivery of complex, multifile objects as efficiently as, for example, a library of digitized books has to manage the multiplicity of image files that make up a single digital volume. the digital library itself is hosted on the widely used fedora repository platform; as a result, it is employing fcm as the basis of its data modelling. the purpose of this analysis is to ascertain whether mets can also be used for the complex models required by this data and to compare its potential viability as an architecture for this type of application with fcm. a particular challenge of this collection is that the raw images from which it is constituted require combining and processing before they are delivered to the user. a further challenge is that the library encompasses images from a variety of experiments, all of which combine these files in different ways and employ different software for processing them. some measure of the complexity of these requirements can be gathered from figure 1 below, which illustrates the processes involved in delivering the digital objects for two types of experiments. figure 1. architecture for two experiment types mets as an intermediary schema for a digital library of scientific multimedia | gartner 26 the images created by two experiments, bleach and actin_5, are shown here: it will be seen that the bleach experiment is divided into two subtypes (here called 2grating and apotone). each type or subtype of experiment has its own requirements for combining the images it produces before they are displayed. for the subtype 2grating, for instance, two images, each generated using a different camera grating, are processed in parallel (indicated by the brackets); these are then combined using the software package process-non-clem (shown by the hexagonal symbol) to produce a display image in tiff format. the subtype apotone requires three grating images and a further image with background information to be processed in parallel by the software process-apotone; in this case, the background image provides data to be subtracted from the combined three grating images to produce the final tiff for display. actin_5 experiments are entirely different: they produce still images that need to be processed sequentially (shown by the braces) to produce a video. encoding the bril architecture in mets this architecture, although complex, is readily handled within mets in a manner analogous to that of more conventional collections. as in any mets file, the structure of the experiments, including their subexperiments, is encoded using nested division (

) elements in the structural map (example 1a).

[subsidiary

s containing image information]

[subsidiary

s containing image information]

[subsidiary

s containing image information]

example 1a. sample experiment-level structural map within these containing divisions, subsidiary

elements are used to map the combination of images necessary to deliver the content for each type. mets allows the specification of the parallel or sequential structuring of files using its and elements respectively. the parallel processing of the apotone subtype, for instance, could be encoded as shown in example 1b. information technology and libraries | september 2012 27

example 1b. sample parallel structure for raw image files to be combined using a process specified in associated metadata (behavior section) each division of the structural map of this type may in its turn be attached to a specific software item in the mets behavior section to designate the application through which it should be processed: the tri-partite set of images in example 1b, for instance, would be linked to the processapotone software using the code in example 1c. example 1c. sample mets behavior mechanism for a specification of image processing this approach is straightforward, and mets is capable of encoding all of the requirements of this data model, although at the cost of large file sizes and a degree of inflexibility. this may be no problem when the principle rationale behind the creation of this metadata is preservation: linking all of the project metadata in a coherent, albeit monolithic, structure of this kind benefits especially its usage as an open archival information system (oais) archival information package (aip), one of the key functions for which mets was designed. problems are likely to arise, however, when this approach is scaled up in a delivery system to include the potentially millions of data objects that this project may produce. the large size of the mets files that this approach necessitates makes their on-the-fly processing for delivery much slower than a system that uses aggregations of the smaller files required by the fcm model and so processes only metadata at the granularity necessary for the delivery of each object. such flexibility is much harder to achieve within mets, although mechanisms that currently exist for aggregating diverse objects within mets may seem to offer some degree of solution to this problem. complex relationships under mets underlying the mets structural map is an assumed ontology of digital objects that encodes a longestablished view of text as an ordered hierarchy of content objects;11 this model accounts for the mets as an intermediary schema for a digital library of scientific multimedia | gartner 28 map’s use of hierarchical nesting and the ordinality of the object’s components. the rigidity of this model is alleviated to some extent by the facility within mets to encode structural links that cut across these hierarchies. these links, which join nodes at any level of the structural map, are particularly useful for encoding hyperlinks within webpages,12 and so are often used for archiving websites. various attempts have been made to extend the functionality of the structural map and structural links sections to allow more sophisticated aggregations and combinations of components beyond the boundaries of a single digital object, in a manner analogous to the flexible granularity of fcm. mets itself offers the possibility of aggregating other mets files through its (mets pointer) element: this element, always a direct child of a

element in the structural map, references a mets file that contains metadata on the digital content represented by this

. for example, two complex digital objects could be represented at a higher collection level, as shown in example 2.

example 2. use of mets element this feature has found some use in such projects as the echo depository, which uses it to register digital objects at various stages of their ingest into, and dissemination from, a repository;13 it is also recommended by the paradigm project as a method for archiving born-digital content, such as emails.14 nonetheless, its usage remains fairly limited; of all the mets profiles registered on the central mets repository, for instance, echo dep at the time of writing remains the only project on the library of congress’s repository of mets profiles to employ this feature. 15 an important reason for its limited take-up is that its potential for more sophisticated uses than merely populating a division of the structural map is severely limited by its place in the mets schema. the element can only be used as a direct child of its parent

: it cannot, for instance, be located in or elements to indicate that the objects referenced in its subordinate mets files should be processed in parallel or in sequence (as is required by the different experiment types in figure 1), nor may the contents of these files be processed by the sophisticated partitioning features of the element, which allows subsidiary parts of a

to be addressed directly. a more sophisticated approach to combining digital object components is to employ open archives initiative object reuse and exchange (oai-ore) aggregations,16 which express more complex relationships at greater levels of granularity than the method allows. information technology and libraries | september 2012 29 mcdonough’s examination of the possibility of aligning the two standards concludes that it is indeed possible, although at the cost of eliminating the mets behavior section and removing much of the flexibility of mets’s structural links, both side effects of oai-ore’s requirement that resource maps must form a connected rdf graph.17 in addition, converting between mets and oai-ore may not be lossless, depending on the design of the mets document.18 neither approach therefore seems ideal for an application of this type, the former because of the limited ways in which the element can be deployed outside the element and its subsidiaries, the latter because of its removal of the functionality of the behavior section, which is essential for the delivery of material such as this. mets as an intermediary schema an alternative approach adopted here uses the technique of employing mets files as intermediary schemas to act as templates from which mets-encoded packages for delivery can be generated. intermediary xml schemas are intermediary in the sense that they are designed not to act as final metadata containers for archiving or delivery, but as mediating encoding mechanisms from which data or metadata in these final forms can be generated by xslt transformations: one example is cerif4ref, a heavily constrained xml schema used to encode research management information from which metadata in the complex common european research information format (cerif) data model can be generated.19 the cerif4ref schema attempts to emulate the architectural processing features of sgml,20 which are absent from xml; these allowed simpler document type definitions (dtds) to be compiled for specific applications, which could then be mapped to established, more complex, sgml models. instead of architectural processing, cerif4ref uses xslt to carry out this processing, so allowing the combination of a simpler scheme tailored to the requirements of an application to be combined with the benefits of a more interoperable but highly complex model that is difficult to implement in its standard form. instead of using this technique for constraining encoding to a simpler model and generating more complex data structures from this, the intermediary schema technique may be used to define templates, similar to a content model, from which the final mets files to be delivered can be constructed. as is the case with cerif4ref, xslt is used for these transformations, and the xslt files form an integral part of the application. in this way, a series of templates, beginning with highest-level abstractions, are used to generate their more concrete subsidiaries, until a final version used for dissemination is generated. the core of this application is a mets file, which acts as a template for the data delivery requirements for each type of experiment. figure 2 demonstrates the components necessary for defining these for the 2grating experiment subtype detailed previously in figure 1. mets as an intermediary schema for a digital library of scientific multimedia | gartner 30 figure 2. defining an experiment subtype in mets the data model for the delivery of these objects is defined in the (b): as can be seen here, a series of nested

elements is used to define the relationship of experiment subtypes to types, and then to define, at the lowest level of this structure, the model for delivering the objects themselves. in this example, two files are to be processed in parallel; these are defined by elements within the (parallel) element. in a standard mets file, the fileid attribute of would reference a element within the mets file section (a): in this case, however, they reference empty file group () elements, which are populated with elements when this template undergoes its xslt transformation. the final component of this template is the mets behavior section (c), in which the applications required to process the digital objects are defined. two behavior sections are shown in this example: the first is used to invoke the xslt transformation by which this mets template file is to be processed, the second to define the software necessary to co-process the two images files for delivery. both indicate the divisions of the structural map whose components they process by their structid attributes: the first references the map as a whole because it applies to recursively to the mets file itself, the second references the experiment for which it is needed. when delivering a digital object, it is then necessary to process this template mets file to generate the final version used to encode its metadata in full. the xslt used to do this co-processes the template and a separate mets file defined for each object containing all of its relevant metadata: information technology and libraries | september 2012 31 this latter file is used to populate the empty sections of the template, in particular the file section. figure 3 provides an illustration of the xslt fragment which carries out this function. figure 3. the xslt transformation file is evoked with the sample parameter, which contains the number of the sample to be rendered: this is used to generate the filename for the document function, which selects the relevant mets file containing metadata for the image itself. the element within this file, which corresponds to the required image, is then integrated into the relevant element in the template file, populating it with its subcomponents, including the element, which contains the location of the file itself. in the case of the actin_5 experiment, which generates a video file from a sequence of still images, the processes involved are slightly more complicated. because the number of still images to be processed will vary for each sample, it is not possible to specify the template for the delivery mets as an intermediary schema for a digital library of scientific multimedia | gartner 32 version of the sequence explicitly within a element as is done for the other experiments. instead, it is necessary to define a further mets file (the “sequence file”) in which the sequence for a given sample is defined. in this case, the architecture is shown in figure 4. figure 4. populating sequentially processed file section with xslt in this case the element in the mets template file acts as a placeholder only and does not encode even the skeletal information for the parallel-processed tiff files in figure 3. similarly, the structural map

for this experiment indicates only that this section is a sequence but does not enumerate the files themselves even in template form. both of these sections are populated when the file is processed by the xslt transformation to import metadata from the mets “sequence file,” information technology and libraries | september 2012 33 in which the file inventory (a) and sequential structure (b) for a given sample are listed. the xslt file populates the file section and structural map directly from this file, replacing the skeletal sections in the template with their counterparts from the sequence file. through this relatively simple xslt transformation, the final delivery version of the mets file is readily generated for either content model. this file can itself then be delivered on the fly (for instance, as a fedora disseminator); this is done by using a further xslt file to process the complex digital object components using the mechanism associated with each experiment in the mets behavior section. given the relatively small size of all of the files involved, this processing can be done more quickly than would be possibly using a fully aggregated mets approach. in the laboratory environment in particular, where the fast rendering and delivery of these images is needed so as not to impede workflows, this has major advantages. although the project aimed to examine specifically the use of fedora for the delivery of this complex material, and so employed fcm as the basis of its metadata strategy, the technique examined in this article proved itself a viable alternative that made much fewer demands on developer time. the small number of xslt stylesheets required to render and deliver the mets files were written within a few hours: the development time to program the delivery of the rdfbased metadata that formed the fcm required several weeks. processing xml using xslt disseminators in fedora is very fast, and so using this method instead of processing rdf introduces no discernible delays in object delivery. conclusions this approach to delivering complex content appears to offer the benefits of the alternative approaches outlined above in a simpler manner than either currently allows. it offers much greater flexibility than the mets element, which can only populate a complete structural map division. when compared to the fcm approach, this strategy, which relies solely on relatively simple xslt transformations for processing the metadata, requires less developer time but offers a similar degree of flexibility of structure and granularity. it also avoids much of the rigidity of the oai-ore approach by not requiring the use of connected rdf graphs, and so frees up the behavior section to define the processing mechanisms needed to deliver these objects. using the intermediary schema technique in this way does therefore offers a means of combining the advantages of employing well-defined interoperable metadata schemes and the practicalities of delivering digital content in an efficient manner, which makes limited demands on development. as such, it represents a viable alternative to the previous attempts to handle complex aggregations within mets discussed above. the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had been in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources for which they were not intended to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation ils. the authors also examine some of the next-generation ilss currently in development that purport to fill the changing needs of libraries. mets as an intermediary schema for a digital library of scientific multimedia | gartner 34 references 1 library of congress, “metadata encoding and transmission standard (mets) official web site,” 2011 http://www.loc.gov/standards/mets (accessed august 1, 2011). 2 organisation internationale de normalisation, “iso/iec jtc1/sc29/wg11: coding of moving pictures and audio,” 2002, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm (accessed august 1, 2011). 3 fedora commons, “the fedora content model architecture (cma),” 2007, http://fedoracommons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html (accessed december 9, 2011). 4 carl lagoze et al., “fedora: an architecture for complex objects and their relationships,” international journal on digital libraries 6, no. 2 (2005): 130. 5 ibid., 127. 6 ibid., 135. 7 ibid. 8 rishi sharma, fedora interoperability review (london: centre for e-research, 2007), http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 (accessed august 1, 2011). 9 richard gartner, “intermediary schemas for complex xml publications: an example from research information management,” journal of digital information 12, no. 3 (2011), https://journals.tdl.org/jodi/article/view/2069 (accessed august 1, 2011). 10 centre for e-research, “bril,” n.d., http://bril.cerch.kcl.ac.uk (accessed august 1, 2011). 11 s. j. derose et al., “what is text, really,” journal of computing in higher education 1, no. 2 (1990): 6. 12 digital library federation, “: metadata encoding and transmission standard: primer and reference manual,” digital library federation, 2010, www.loc.gov/standards/mets/metsprimerrevised.pdf, 77 (accessed august 1, 2011). 13 bill ingram, “echo dep mets profile for master mets documents,” n.d., http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml (accessed august 1, 2011). 14 susan thomas, “using mets for the preservation and dissemination of digital archives,” n.d., www.paradigm.ac.uk/workbook/metadata/mets-altstruct.html (accessed august 1, 2011). 15 library of congress. “mets profiles: metadata encoding and transmission standard (mets) http://www.loc.gov/standards/mets http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 https://journals.tdl.org/jodi/article/view/2069 http://bril.cerch.kcl.ac.uk/ http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml information technology and libraries | september 2012 35 officialweb site”, 2011. http://www.loc.gov/standards/mets/mets-profiles.html (accessed december 6, 2011). 16 open archives initiative, “open archives initiative protocol—object exchange and reuse,” n.d., www.openarchives.org/ore (accessed december 12, 2011). 17 jerome mcdonough, “aligning mets with the oai-ore data =mmodel,” jcdl ’09 proceedings of the 9th acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2009): 328. 18 ibid., 329. 19 gartner, “intermediary schemas.” 20 gary simons, “using architectural processing to derive small, problem-specific xml applications from large, widely-used sgml applications,” summer institute of linguistics electronic working papers (chicago: summer institute of linguistics, 1998), www.silinternational.org/silewp/1998/006/silewp1998-006.html (accessed august 1, 2011). http://www.loc.gov/standards/mets/mets-profiles.html http://www.openarchives.org/ore/ a tale of two tools: comparing libkey discovery to quicklinks in primo ve communication a tale of two tools comparing libkey discovery to quicklinks in primo ve jill k. locascio and dejah rubel information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16253 jill k. locascio (jlocascio@sunyopt.edu) is associate librarian, suny college of optometry. dejah rubel (dejahrubel@ferris.edu) is metadata and electronic resources management librarian, ferris state university. © 2023. introduction consistent delivery of full-text content has been a challenge for libraries since the development of online databases. library systems have attempted to meet this challenge, but link resolvers and early direct linking tools often fell short of patron expectations. in the last several years, a new generation of direct linking tools has appeared, two of which will be discussed in this article: third iron’s libkey discovery and quicklinks by ex libris, a clarivate company. figure 1 shows the “download pdf” link added by libkey. figure 2 shows the “get pdf” link provided by quicklinks. the way we configured our discovery interface, a resource cannot receive both the libkey and quicklinks pdf links. these two direct linking tools were chosen because they were both relatively new to the market in april 2021 when this analysis took place and they can both be integrated into primo ve, the library discovery system of choice at the authors’ home institutions of suny college of optometry and ferris state university. through analysis of the frequency of direct links, link success rate, and number of clicks, this study may help determine which product is most likely to meet your patrons’ needs. figure 1. example of a libkey discovery link in primo ve. figure 2. example of a quicklink in primo ve. mailto:jlocascio@sunyopt.edu mailto:dejahrubel@ferris.edu information technology and libraries june 2023 a tale of two tools 2 locascio and rubel literature review over the past 20 years link resolvers and direct linking have evolved in tandem. early link generator tools, such as proquest’s sitebuilder, often involved a process that “… proved too cumbersome for most end-users.”1 five years later, tools from ebsco, gale, ovid, and proquest had improved, but they were all proprietary. bickford postulates that metadata-based standards, like openurl, may make linking as simple as copying and pasting from the address bar; however, they may be more likely to fail “… as long as vendors use incompatible, inaccurate, or incomplete metadata.”2 the first research was wakimoto’s 2006 study of sfx, which relied on 224 test queries and 188,944 individual uses for its data set. 3 of those queries, 39.7% of search results included a full-text link and that link was accessed 65.2% of the time. unfortunately, wakimoto also discovered that 22.2% of all full-text results failed and concluded that most complaints against sfx were problems with the systems it links to and not the link resolver itself. alth ough intended to be provider-neutral, the openurl standard is, in fact, vulnerable to metadata omissions. content providers, whether aggregators or publishers, have a vested interest in link stability and platform use and have therefore invested in building direct link generation tools. in 2006, grogg examined ebsco’s smartlink, which checks access rights before generating the link; proquest’s crosslinks, which was used to link from proquest to another vendor’s content; silverplatter and links@ovid, which relied on a knowledge base in the terabytes for static links.4 in 2008, cecchino described the national library of medicine’s linkout tool for selected publishers within pubmed.5 they also described two ovid products: links@ovid and linksolver, noting that the former is similar to linkout and the latter is similar to sfx. most of the time these tools worked well, but their use was restricted to a particular platform or set of publishers. as online public catalogs became discovery layers, direct linking became a feature of the library management system. two studies have been done thus far: silton’s analysis of summon and stuart’s analysis of 360 link. in 2014, silton tested the percentage of full-text articles retrievable from summon by running a test query and examining the first 100 results. over a year, the total success rate for unfiltered queries rose from 61% to 76%. after direct linking was introduced, the success rate of link resolver links rose to 65.8–73% and direct links succeeded 90.48–100% of the time. silton concluded, “while direct linking had some issues in its early months, it generally performs better than the link resolver.”6 in 2011, stuart, varnum, and ahronheim began testing the 1-click feature of 360 link on 579 citations, 82.2% of which were successful. after direct linking became an option for summon in 2012, 61–70% of their sample relied on it. “between direct linking and 1-click about 93 to 94% of the time an attempt was made to lead users directly to the full text of the article … [and] … we were able to reach full text … from 79% to about 84% of the time.”7 direct linking outperformed 1-click with a 90% success rate compared to 58–67% for 1-click. stuart also compared the actual error rate with one based on user reports and discovered that “relying solely on user reports of errors to judge the reliability of full-text links dramatically underreports true problems by a factor of 100.”8 openurl links were especially alarming with approximately 20% of them failing. although direct linking is more reliable, stuart closes by noting that direct linking binds libraries closer to vendors thereby decreasing institutional their flexibility. information technology and libraries june 2023 a tale of two tools 3 locascio and rubel methods the goal of this project was to assess two of the latest direct linking tools: ex libris’s native quicklinks feature and third iron’s libkey discovery. we performed a side-by-side comparison of the two tools by searching for specific articles in primo ve, the library discovery system used by the authors’ respective home institutions, suny college of optometry and ferris state university, and measuring • how often each vendor’s direct links appeared on the brief record; • success rate of the links; and • number of clicks it takes from each link to reach the pdf full text. both suny college of optometry and ferris state university use ex libris’ alma as their library services platform. alma provides a number of usage reports in their analytics module. we sourced the queries used in our analysis from the alma analytics link resolver usage report. the report contains a field number of requests, which records the number of times an openurl request was sent to the link resolver. an openurl request is sent to the link resolver when the user clicks on a link to the link resolver from an outside source (such as google scholar), for example, when the user submits a request using primo’s citation linker or when the user accesses the article’s full record in primo by clicking on either the brief record’s title or availability statement. this means that results that have a direct link (whether a quicklink or libkey discovery link) on the brief record will not appear in the report if the user clicked the direct link to the article. thus, in order to create test searches that would be an accurate representation of articles being accessed, we used article titles taken from suny optometry’s october 2019 alma link resolver usage report— a report that was generated prior to the implementation of both libkey discovery and quicklinks. the report was filtered to include only articles with the source type of primo/primo central to ensure that the initial search was taking place within the native primo interface, as requests from outside sources like google scholar or from primo’s citation linker are irrelevant to this analysis. this filtering generated a total of 412 articles. after further removal of duplicates and non -article material, there were 386 article titles in our test query set. we created two separate primo views as test environments: one with libkey discovery and the other with quicklinks. we ran the test searches twice in each view. in the first round of testing, we recorded whether a direct link was present. we also recorded the name of the full-text provider (if present), as well as whether the article was open access. suny optometry does not filter their primo results by availability; therefore, many of the articles included in the initial search did not have any associated full-text activations. since these articles are irrelevant to our assessment, we removed them before analyzing the first round of data and proceeding with the second search. the exception to these removals were articles identified as open access by unpaywall, as the presence of unpaywall links is independent of any activations in alma. furthermore, third iron’s libkey discovery and ex libris’ quicklinks both incorporate unpaywall’s api into their products to provide direct links to pdfs of open access articles. this functionality helps fill coverage gaps where institutions may not have activated a hybrid open access journal due to its paywalls. therefore, we are including the presence of direct links resulting from the unpaywall api when determining whether a libkey discovery link or quicklink is present. after filtering for availability, we had 254 article titles for the first round of searching and analysis. the initial analysis revealed the need to further filter the information technology and libraries june 2023 a tale of two tools 4 locascio and rubel articles used for the second round of searching, which would provide a much closer comparison of the two direct linking tools as third iron had partnered with more content providers than ex libris. controlling for shared providers would give a more accurate representation of how each direct linking tool performs in relation to the other. when controlling for shared providers and open access articles, we were left with 145 article titles for the second query set. during the second round of searching, we measured whether the direct link was successful in linking to the full text—meaning that the link was neither broken nor linked to an incorrect article—and how many clicks were necessary to get from the direct link to the article pdf. along the way, additional qualitative measures were observed, such as document download time and metadata record quality. while not as easy to measure as the quantitative data, these observations provided additional insight into the strengths and weaknesses of each of these direct linking tools. since april 2022, when our research was conducted, ex libris has added several quicklinks providers, possibly increasing the current number of quicklinks available. additionally, both rounds of searching were conducted on campus, so our analysis excludes any consideration of authentication and/or proxy information. results of the 254 articles searched, 208 (82%) had libkey discovery links present while 129 (52%) had quicklinks present. while this seems like a large discrepancy between the two direct link providers, it can be explained by the fact that during the time of testing, ex libris was collaborating with fewer content providers than third iron. ex libris has since added more providers. while the provider discrepancy meant that there were many instances where a libkey discovery link was present where a quicklink was not, there were 5 articles where a quicklink was present while a libkey discovery link was not. as mentioned previously, the criterion for the 254 articles included in the second round of searching was that the articles must be activated in alma or must be open access. of these 254 articles, we identified 137 (54%) as open access. of those open access articles, 132 (96%) had libkey discovery links present, and 118 (86%) had quicklinks present. we found that 113 (82%) of the open access articles had both libkey discovery links and quicklinks present. we also discovered within this set of 137 open access articles that 30 (22%) were from non-activated resources. of those 30 open access articles from non-activated titles, all 30 (100%) had libkey discovery links appearing on the brief results and 24 (80%) had quicklinks. to get a better idea of how libkey discovery links and quicklinks compared in terms of linking success, we filtered to only those articles available from providers who were participating in both libkey discovery links as well as quicklinks. since both direct linking tools use unpaywall integrations, we continued to include open access articles. this filtering resulted in 145 articles where libkey discovery links were present in 137 articles (94%) while quicklinks were present in 129 articles (89%). we found that 123 (85%) of these 145 articles had both libkey discovery links and quicklinks present. there were 2 (1%) articles that had neither libkey discovery links nor quicklinks present despite being activated in a journal currently participating as a provider in both direct linking tools. there were also 14 articles (10%) that had libkey discovery links but information technology and libraries june 2023 a tale of two tools 5 locascio and rubel not quicklinks; all of these articles were open access. in total, of the 145 articles searched, 128 (88%) were identified as open access. as for the 137 libkey discovery links, 130 (95%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. of the 129 quicklinks, 126 (98%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. we also attempted to measure the time it took for the pages to load after the initial click on the libkey discovery links and quicklinks; however, the tools used to measure this, as well as the environments in which the links were being clicked, proved too varied to provide an appropriate comparison. nevertheless, we noted observations such as the page load times after clicking on libkey discovery links and quicklinks were generally consistent, but quicklinks attempts to connect to the wiley platform took a significant time (at least 10 seconds) to load. conclusions with high article linking success rates, both third iron’s libkey discovery and ex libris’ quicklinks deliver on the promise to provide fast and seamless access to full-text articles. however, the libkey discovery tool far outpaces quicklinks when it comes to coverage. both direct linking tools perform well with open access articles, supplying libraries with better options for full-text links to articles that may be in hybrid journals. as with any kind of full-text linking, both direct linking tools rely on metadata. in conclusion, while libkey discovery provides a more complete direct linking solution, both libkey discovery and quicklinks are reliable tools that improve primo’s discovery and delivery experience. endnotes 1 david bickford, “using direct linking capabilities in aggregated databases for e-reserves,” journal of library administration 41, no. 1/2 (2004): 31–45, https://doi.org/10.1300/j111v41n01_04. 2 bickford, 45. 3 wendy furlan, “library users expect link resolvers to provide full text while librarians expect accurate results,” evidence based library and information practice 1, no. 4 (2006): 60–63, https://doi.org/10.18438/b88c7p. 4 jill e. grogg, “linking without a stand-alone link resolver,” library technology reports 42, no. 1 (2006): 31–34. 5 nicola j. cecchino, “full-text linking demystified,” journal of electronic resources in medical libraries 5, no. 1 (2008): 33–42, https://doi.org/10.1080/15424060802093377. 6 kate silton, “assessment of full-text linking in summon: one institution’s approach,” journal of electronic resources librarianship 26, no. 3 (2014): 163–69, https://doi.org/10.1080/1941126x.2014.936767. https://doi.org/10.1300/j111v41n01_04 https://doi.org/10.18438/b88c7p https://doi.org/10.1080/15424060802093377 https://doi.org/10.1080/1941126x.2014.936767 information technology and libraries june 2023 a tale of two tools 6 locascio and rubel 7 kenyon stuart, ken varnum, and judith ahronheim, “measuring journal linking success from a discovery service,” information technology and libraries 34, no. 1 (2015): 52–76, https://doi.org/10.6017/ital.v34i1.5607. 8 stuart, varnum, and ahronheim, 74. https://doi.org/10.6017/ital.v34i1.5607 introduction literature review methods results conclusions guest editorial clifford lynch information technology and libraries | march 2012 3 congratulations lita and information technology and libraries. since the early days of the internet, i’ve been continually struck by the incredible opportunities that it offers organizations concerned with the creation, organization, and dissemination of knowledge to advance their core missions in new and more effective ways. libraries and librarians were consistently early and aggressive in recognizing, seizing, and advocating for these opportunities, though they’ve faced—and continue to face—enormous obstacles ranging from copyright laws to the amazing inertia of academic traditions in scholarly communication. yet the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer. making these changes is not easy: there are real financial implications that suddenly seem very serious when you are a member of a board of directors, charged with a fiduciary duty to your association, and you have to push through plans to realign its finances, organizational mission, and goals in the new world of networked information. so, as a long-time lita member, i find it a great pleasure to see lita finally reach this milestone with information technology and libraries (ital) moving to fully open-access electronic distribution, and i congratulate the lita leadership for the persistence and courage to make this happen. it’s a decision that will, i believe, make the journal much more visible, and a more attractive venue for authors; it will also make it easier to use in educational settings, and to further the interactions between librarians, information scientists, computer scientists, and members of other disciplines. on a broader ala-wide level, ital now joins acrl’s college & research libraries as part of the american library association’s portfolio of open-access journals. supporting ital as an open-access journal is a very good reason indeed to be a member of lita. clifford lynch (clifford@cni.org) is executive director, coalition for networked information. mailto:clifford@cni.org a simple scheme for book classification using wikipedia | yelton 7 andromeda yelton a simple scheme for book classification using wikipedia ■■ background hanne albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and requirements-oriented.3 in the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct linguistic abstractions of documents.” the content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. requirementsoriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that documents may derive from their context. (see, for instance, the work of hjørland and mai.4) albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. the difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their importance. synonymy and polysemy complicate the task. background knowledge is needed to draw inferences from text to larger meaning. these would be insuperable barriers if computers limited to simple word counts. however, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. for instance, enriching metadata with princeton university’s wordnet and the national library of medicine’s medical subject headings (mesh) is a common tactic,5 and the yahoo! category structure has been used as an ontology for automated document classification.6 several projects have used library of congress classification (lcc), dewey decimal classification (ddc), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 all of these tools have had problems, though, with issues such as coverage, currency, and cost. this has motivated research into the use of wikipedia in their stead. since wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its english edition alone as of this writing; this gives it unparalleled coverage. wikipedia also has many thesaurus-like features. redirects function as “see” references by linking synonyms to preferred terms. disambiguation pages deal with homonyms. the polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. links between pages function as related-term indicators. editor’s note: this article is the winner of the lita/ex libris student writing award, 2010. because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. however, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. i have developed a script that uses wikipedia as context for analyzing the subjects of nonfiction books. though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. a s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. it is impossible even to keep up with all the documents in a bounded scope, such as academic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 therefore a scheme of reliable, automated subject classification would be of great benefit. however, there are many barriers to such a scheme. naive word-counting schemes isolate common words, but not necessarily important ones. worse, the words for the most important concepts of a text may never occur in the text. how can this problem be addressed? first, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. some corpus that connects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. fortunately, there is such a corpus: wikipedia. what, after all, is a wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? furthermore, the other elements of my scheme were readily available. for many books, amazon lists statistically improbable phrases (sips)— that is, phrases that are found “a large number of times in a particular book relative to all search inside! books.”2 and google provides a way to find pages highly relevant to a given phrase. if i used google to query wikipedia for a book’s sips (using the query form “site:en.wikipedia .org sip”), would wikipedia’s page titles tell me something useful about the subject(s) of the book? andromeda yelton (andromeda.yelton@gmail.com) graduated from the graduate school of library and information science, simmons college, boston, in may 2010. 8 information technology and libraries | march 2011 ■■ an initial test case to explore whether my method was feasible, i needed to try it on a test case. i chose stephen hawking’s a brief history of time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmology” by the library of congress. i began by looking up its sips on amazon.com. noticing that amazon also lists capitalized phrases (caps)—“people, places, events, or important topics mentioned frequently in a book”—i included those as well (see table 1).14 i then queried wikipedia via google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” i selected the top three wikipedia article hits for each phrase. this yielded a list of sixty-one distinct items with several interesting properties: ■■ four items appeared twice (arrow of time, entropy [arrow of time], inflation [cosmology], richard feynman). however, nothing appeared more than twice; that is, nothing definitively stood out. ■■ many items on the list were clearly relevant to brief history, although often at too small a level of granularity to be good subject headings (e.g., black hole, second law of thermodynamics, time in physics). ■■ some items, while not unrelated, were wrong as subject classifications (e.g., list of solar system objects by size, nobel prize in physics). ■■ some items were at best amusingly, and at worst bafflingly, unrelated (e.g., alpha centauri [doctor who], electoral district [canada], james k. polk, united states men’s national soccer team). ■■ in addition, i had to discard some of the top google hits because they were not articles but wikipedia special pages, such as “talk” pages devoted to discussion of an article. this test showed that i needed an approach that would give me candidate subject headers at a higher level of granularity. i also needed to be able to draw a brighter line between candidates and noncandidates. the presence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. as it happens, wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 i hoped that by harvesting categories from the sixty-one pages i had discovered, i could improve my method. this yielded a list of more than three hundred categories. unsurprisingly, this list mostly comprised irrelevant because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used wikipedia for metadata enrichment, text clustering and classification, and the like. for example, han and zhao wanted to automatically disambiguate names found online but faced many problems familiar to librarians: “the traditional methods measure the similarity using the bag of words (bow) model. the bow, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. so the bow cannot reflect the actual similarity.” to counter this, they constructed a semantic model from information on wikipedia about the associative relationships of various ideas. they then used this model to find relationships between information found in the context of the target name in different pages. this enabled them to accurately group pages pertaining to particular individuals.8 carmel, roitman, and zwerdling used page categories and titles to enhance labeling of document clusters. although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. by extracting cluster keywords, using them to query wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 schönhofen looked at the same problem i examine— identifying document topics with wikipedia data—but he used a different approach. he calculated the relatedness between categories and words from titles of pages belonging to those categories. he then used that relatedness to determine how strongly words from a target document predicted various wikipedia categories. he found that although his results were skewed by how wellrepresented topics were on wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to automatically generate domain-specific thesauri,12 and to improve wikipedia itself by suggesting appropriate categories for articles.13 in short, wikipedia has many uses for metadata enrichment. while text classification is one of these potential uses, and one with promise, it is under-explored at present. additionally, this exploration takes place almost entirely in the proceedings of computer science conferences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. this paper aims to bridge that gap. a simple scheme for book classification using wikipedia | yelton 9 computationally trivial to do so, given such a list. (the list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from october 2009” and “articles with unsourced statements from may 2008.”) at this stage of research, however, i simply ignored these categories in analyzing my results. to find a variety of books to test, i used older new york times nonfiction bestseller lists because brand-new books are less likely to have sips available on amazon.19 these lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ results of the thirty books i examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had sips and caps available on amazon. i ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ precision (p): of the top categories, how many were synonyms or near-synonyms of the book’s lcshs? ■■ recall (r): of the book’s lcshs, how many had synonyms or near-synonyms among the top categories? ■■ right-but-wrongs (rbw): of the top categories, how many are reminiscent of the lcshs without actually being synonymous? these included narrower terms (e.g., the category “african_american_actors” when the lcshs included “actors—united states —biography”), broader terms (e.g., “american_folk_ singers” vs. “dylan, bob, 1941–”), related terms (e.g., “the_chronicles_of_narnia_books” vs. “lion, the witch and the wardrobe (motion picture)”), and examples (“killian_documents_controversy” vs. “united states—politics and government—2001–2009”). i considered the “top categories” for each book to be the five that most commonly occurred (excluding wikipedia administrative categories), with the following exceptions: ■■ because i had no basis to distinguish between them, i included all equally popular categories, even if that would bring the total to more than five. thus, for example, for the book collapse, the most common category occurred seven times, followed by two categories with five appearances and six categories with four. rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, i examined all nine top categories. ■■ if there were more than five lcshs, i expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of asia,” “video games with expansion packs,” “organizations based in sweden,” among many others). many categories played a clear role in the wikipedia ecology of knowledge but were not suitable as general-purpose subject headers (“living people,” “1849 deaths”). strikingly, though, the vast majority of candidates occurred only once. only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” twelve occurrences, four times as many as the next candidate, looked like a bright line. and “physical cosmology” is an excellent description of brief history— arguably better than lcsh’s “cosmology.” the approach looked promising. ■■ automating further test cases the next step was to test an extensive variety of books to see if the method was more broadly applicable. however, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. therefore i wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs google queries against wikipedia for all of them18 ■■ selects the top hits after filtering out some common wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence this algorithm did not filter out wikipedia administrative categories, as creating a list of them would have been prohibitively time-consuming. however, it would be table 1. sips and caps for a brief history of time sips grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories caps alpha centauri, solar system, nobel prize, north pole, united states, edwin hubble, royal society, richard feynman, milky way, roger penrose, first world war, weak anthropic principle 10 information technology and libraries | march 2011 “continental_army_generals” vs. “united states— history—revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the lcsh but not at all in the same way ■■ wrong: the categories were actively misleading the results are displayed in table 2. ■■ discussion the results of this test were decidedly more mixed than those of my initial test case. on some books the wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ i did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. the lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. i also considered, subjectively, the degree of overlap between the lcshs and the top wikipedia categories. i chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the lcsh ■■ near miss: some categories suggested the lcsh but missed its key points, such as table 2. results (sorted by percentage of relevant categories). book p r rbw subjective quality chronicles, bob dylan 0.2 0.5 0.8 strong the chronicles of narnia: the lion, the witch and the wardrobe official illustrated movie companion, perry moore 0.25 1 0.625 strong 1776, david mccullough 0 0 0.8 near miss 100 people who are screwing up america, bernard goldberg 0 0 0.625 weak the bob dylan scrapbook, 1956–1966, with text by robert santelli 0.2 0.5 0.4 strong three weeks with my brother, nicholas sparks 0 0 0.57 weak mother angelica, raymond arroyo 0.07 0.33 0.43 near miss confessions of a video vixen, karrine steffans 0.25 0.33 0.25 weak the fairtax book, neal boortz and john linder 0.17 0.33 0.33 strong never have your dog stuffed, alan alda 0 0 0.43 weak the world is flat, thomas l. friedman 0.4 0.5 0 near miss the tender bar, j. r. moehringer 0 0 0.2 wrong the tipping point, malcolm gladwell 0 0 0.2 wrong collapse, jared diamond 0 0 0.11 weak blink, malcolm gladwell 0 0 0 wrong freakonomics, steven d. levitt and stephen j. dubner 0 0 0 wrong guns, germs, and steel, jared diamond 0 0 0 weak magical thinking, augusten burroughs 0 0 0 wrong a million little pieces, james frey 0 0 0 wrong worth more dead, ann rule 0 0 0 wrong tuesdays with morrie, mitch albom no category with more than 4 occurrences a simple scheme for book classification using wikipedia | yelton 11 my method’s success with a brief history of time. i tested another technical, jargon-intensive work (n. gregory mankiw’s macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ future research the method outlined in this paper is intended to be a proof of concept using readily available tools. the following work might move it closer to a real-world application: ■■ a configurable system for providing statistically improbable phrases; there are many options.23 this would provide the user with more control over, and understanding of, sip generation (instead of the amazon black box), as well as providing output that could integrate directly with the script. ■■ a richer understanding of the wikipedia category system. some categories (e.g., “all articles with unsourced statements”) are clearly useful only for wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconomics stubs”). many of these could be filtered out or reformatted automatically. ■■ greater use of wikipedia as an ontology. for example, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. a more thorough understanding of wikipedia’s relational structure might help disambiguate terms.24 others, it performed very poorly. however, there are several patterns here: many of these books were autobiographies, and the method was ineffective on nearly all of these.20 a key feature of autobiographies, of course, is that they are typically written in the first person, and thus lack any term for the major subject—the author’s name. biography, by contrast, is rife with this term. this suggests that including titles and authors along with sips and caps may be wise. additionally, it might require making better use of wikipedia as an ontology to look for related concepts (rather in the manner that han and zhao used it for name disambiguation).21 books that treat a single, well-defined subject are easier to analyze than those with more sprawling coverage. in particular, books that treat a concept via a sequence of illustrative essays (e.g., tipping point, freakonomics) do not work well at all. sips may apply only to particular chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for freakonomics, the fascinating chapter on sudhir venkatesh’s work on “gangs_in_chicago, _illinois”22) rather than the connecting threads of the entire book (e.g. “economics—sociological aspects”). the tactics suggested for autobiography might help here as well. my subjective impressions were usually, but not always, borne out by the statistics. this is because some of the rbws were strongly related to one another and suggested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. there was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. remember that in the brief history example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. therefore i looked at how many times the top category for each book occurred in my results. i averaged this number for each subjective quality group; the results are in table 3. in other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. this suggests that a system such as this could be used with very little modification to streamline categorization. for example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. it was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or technical works. it is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence table 3. category frequency and subjective quality subjective quality of categories frequencies of most common category average frequency of most common category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 information technology and libraries | march 2011 (1993): 219. 4. birger hjørland, “the concept of subject in information science,” journal of documentation 48, no. 2 (1992): 172; jenserik mai, “classification in context: relativity, reality, and representation,” knowledge organization 31, no. 1 (2004): 39; jens-erik mai, “actors, domains, and constraints in the design and construction of controlled vocabularies,” knowledge organization 35, no. 1 (2008): 16. 5. xiaohua hu et al., “exploiting wikipedia as external knowledge for document clustering,” in proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining, paris, france, 28 june–1 july 2009 (new york: acm, 2009): 389. 6. yannis labrou and tim finin, “yahoo! as an ontology— using yahoo! categories to describe documents,” in proceedings of the eighth international conference on information and knowledge management, kansas city, mo, usa 1999 (new york: acm, 1999): 180. 7. kwan yi, “automated text classification using library classification schemes: trends, issues, and challenges,” international cataloging & bibliographic control 36, no. 4 (2007): 78. 8. xianpei han and jun zhao, “named entity disambiguation by leveraging wikipedia semantic knowledge,” in proceeding of the 18th acm conference on information and knowledge management, hong kong, china, 2–6 november 2009 (new york: acm, 2009): 215. 9. david carmel, haggai roitman, and naama zwerdling, “enhancing cluster labeling using wikipedia,” in proceedings of the 32nd international acm sigir conference on research and development in information retrieval, boston, ma, usa (new york: acm, 2009): 139. 10. peter schönhofen, “identifying document topics using the wikipedia category network,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, hong kong, china, 18–22 december 2006 (los alamitos, calif.: ieee computer society, 2007). 11. hu et al., “exploiting wikipedia.” 12. david milne, olena medelyan, and ian h. witten, “mining domain-specific thesauri from wikipedia: a case study,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, 22–26 december 2006 (washington, d.c.: ieee computer society, 2006): 442. 13. zeno gantner and lars schmidt-thieme, “automatic content-based categorization of wikipedia articles,” in proceedings of the 2009 workshop on the people’s web meets nlp, acl-ijcnlp 2009, 7 august 2009, suntec, singapore (morristown, n.j.: association for computational linguistics, 2009): 32. 14. “amazon.com capitalized phrases,” amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed mar. 13, 2010). 15. for more on the epistemological and technical roles of categories in wikipedia, see http://en.wikipedia.org/wiki/ wikipedia:categorization. 16. two sources greatly helped the script-writing process: william steinmetz, wicked cool php: real-world scripts that solve difficult problems (san francisco: no starch, 2008); and the documentation at http://php.net. 17. not all books on amazon.com have sips, and books that do may only have them for one edition, although many editions may be found separately on the site. there is not a readily apparent pattern determining which edition features sips. therefore ■■ a special-case system for handling books and authors that have their own article pages on wikipedia. in addition, a large-scale project might want to work from downloaded snapshots of wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, this would require using something other than google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ conclusions even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. given the staggering volume of documents now being generated, automated classification is an important avenue to explore. i close with a philosophical point. although i have characterized this work throughout as automated classification, and it certainly feels automated to me when i use the script, it does in fact still rely on human judgment. wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. even google’s pagerank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 my algorithm therefore does not operate in lieu of human judgment. rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. with the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. references and notes 1. carol tenopir. “online databases—online scholarly journals: how many?” library journal (feb. 1, 2004), http://www .libraryjournal.com/article/ca374956.html (accessed mar. 13, 2010). 2. “amazon.com statistically improbable phrases,” amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed mar. 13, 2010). 3. hanne albrechtsen. “subject analysis and indexing: from automated indexing to domain analysis,” the indexer, 18, no. 4 a simple scheme for book classification using wikipedia | yelton 13 problematic million little pieces to be autobiography, as it has that writing style, and as its lcsh treats it thus. 21. han and zhao, “named entity disambiguation.” 22. sudhir venkatesh, off the books: the underground economy of the urban poor (cambridge: harvard univ. pr., 2006). 23. see karen coyle, “machine indexing,” the journal of academic librarianship 34, no. 6 (2008): 530. she gives as examples phraserate (http://ivia.ucr.edu/projects/phraserate/), kea (http://www.nzdl.org/kea/), and extractor (http://extractor. com/). 24. per han and zhao, “named entity disambiguation.” 25. lawrence page et al., “the pagerank citation ranking: bringing order to the web,” stanford infolab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed mar. 13, 2010). this paper precedes the launch of google; as the title indicates, the citation index is one of google’s foundational ideas. this step cannot be automated. 18. be aware that running automated queries without permission is an explicit violation of google’s terms of service. seegoogle webmaster central, “automated queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed mar. 13, 2010). before using this script, obtain an api key, which confers this permission. ajax web search api keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “hardcover nonfiction,” new york times, oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed mar. 13, 2010); “paperback nonfiction,” new york times, oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed mar. 13, 2010). 20. for the purposes of this discussion i consider the appendix. php script for automated classification 4) { echo “i’m sorry; the number specified cannot be more than 4.”; die; } // next, turn our comma-separated list into an array. 14 information technology and libraries | march 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* here we access google search results for our sips and caps. it is a violation of the google terms of service to run automated queries without permission. obtain an ajax api key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* in multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the urls we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* strip out some crud from the json syntax to get just urls */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* here we step through the links in the page google returned to us and find the top wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* these variables test to see if we have hit a wikipedia special page instead of an article. there are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/file:’); $cattest = strpos($testlink, ‘wiki/category:’); $usertest = strpos($testlink, ‘wiki/user’); $talktest = strpos($testlink, ‘wiki/talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ appendix. php script for automated classification (continued) a simple scheme for book classification using wikipedia | yelton 15 if ($i == $argv[2]) { break; } //this closes the foreach loop which steps through $links } // this closes the foreach loop which steps through $sip_array } /* for each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* strip out the “wiki/category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // sort by value: most popular categories first. arsort($mastercatarray); echo “the top categories are:\n”; print_r($mastercatarray); ?> appendix. php script for automated classification (continued) editorial | truitt 163 ■■ the space in between in my opinion, ital has an identity crisis. it seems to try in many ways to be scholarly like jasist, but lita simply isn’t as formal a group as asist. on the other end of the spectrum, code4lib is very dynamic, informal and community-driven. ital kind of flops around awkwardly in the space in between. —comment by a respondent to ital’s reader survey, december 2009 last december and january, you, the readers of information technology and libraries were invited to participate in a survey aimed at helping us to learn your likes and dislikes about ital, and where you’d like to see this journal go in terms of several important questions. the responses provide rich food for reflection about ital, its readers, what we do well and what we don’t, and our future directions. indeed, we’re still digesting and discussing them, nearly a year after the survey. i’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. i strongly encourage you to access the full survey results, which i’ve posted to our weblog italica (http:// ital-ica.blogspot.com/); i further invite you to post your own thoughts there about the survey results and their meaning. we ran the survey from mid-december to mid-january. a few responses trickled in as late as mid-february. the survey invitation was sent to the 2,614 lita personal members; nonmembers and ital subscribers (most of whom are institutions) were excluded. we ultimately received 320 responses—including two from individuals who confessed that they were not actually lita members—for a response rate of 12.24 percent. thus the findings reported below reflect the views of those who chose to respond to the survey. the response rate, while not optimal, is not far from the 15 percent that i understand lita usually expects for its surveys. as you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. who are we? in analyzing the survey responses, one of the first things one notices is the range and diversity of ital’s reader base, and by extension, of lita’s membership. the largest groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. but more than 20 percent (71) come from the ranks of library directors and associate directors. nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. see figure 1. the bottom line is that more than a third of our readers are coming from areas outside of library it. a couple of other demographic items: ■■ while nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. more than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ nearly half (152, or 48.3 percent) of our readers indicated that they have been with lita for five years or fewer. note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. still, i confess that this was something of a surprise to me, as i expected larger numbers of long-time members. and how do the numbers shake out for us old geezers? the 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been lita members for between 11 and 15 years. assuming that these numbers are an accurate reflection of lita’s membership, i can’t help but wonder about the explanation for this anomaly.” see figure 2. how are we doing? question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ital are peerreviewed.” more than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ital is timely.” more than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ital is timely. only 27 (8.7 percent) disagreed. as a technology-focused journal, where time-to-publication is always a sensitive issue, i expected more dissatisfaction on this question (and no, that doesn’t mean that i don’t worry about the nine percent who believe we’re too slow out of the gate). marc truitt editorial: the space in between, or, why ital matters marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 164 information technology and libraries | december 2010 would likely quit lita, with narrative explanations that clearly underscore the belief that ital—especially a paper ital—is viewed by many as an important benefit of membership. the following comments are typical: ■■ “lita membership would carry no benefits for me.” ■■ “dues should decrease, though.” [from a respondent who indicated he or she would retain lita “i use information from ital in my work and/ or i find it intellectually stimulating.” by a nearly identical margin to that regarding timeliness, ital readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ital in their work or find its contents stimulating. “ital is an important benefit of lita membership.” an overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 this perception clearly emerges again in responses to the questions about whether readers would drop their lita membership if we produced an electronic-only or open-access ital (see below). where should we be going? several questions sought your input about different options for ital as we move forward. question 7, for example, asked you to rank how frequently you access ital content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ital website,” or “electronic copy accessed via an aggregator service to which your institution/library subscribes (e.g., ebsco).” the choice most frequently accessed was the print copy received via membership, at 81.1 percent (228). question 8 asked about your preferences in terms of ital’s publication model. of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. four in ten respondents preferred that ital move to publication in electronic version only.2 of those who favored continued availability of paper, the great majority (159, or 83.2 percent) indicated in question 9 that they simply preferred reading ital in paper. those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmental friendliness of electronic publication. a final question in this section asked that you respond to the statement “if ital were to become an electronic-only publication i would continue as a dues-paying member of lita.” while a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you figure 2. years of lita membership figure 1. professional position of lita members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% systems librarian (includes responsibility for ils, servers, workstat... other (please specify) library director web services/development librarian deputy/associate/assistant director reference services librarian cataloging librarian consortium/network/vendor librarian electronic resources librarian digital projects/digitization librarian student teaching faculty computing professional (non-mls) resource sharing librarian acquisitions/collection development librarian other library staff (non-mls) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years editorial | truitt 165 his lipstick-on-a-pig ils. somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ital papers in the wee hours of the morning. and that ain’t the half of it, as they say. in short—in terms of readers, interests, and preferences—“the space in between” is a pretty big niche for ital to serve. we celebrate it. and we’ll keep trying our best to serve it well. ■■ departures as i write these lines in late-september, it’s been a sad few weeks for those of us in the ital family. in mid-august, former ital editor jim kopp passed away following a battle with cancer. last week, dan marmion—jim’s successor as editor (1999–2004)—and a dear friend to many of us on the current ital editorial board—also left us, the victim of a malignant brain tumor. i never met jim, but lita president karen starr eulogized him in a posting to lita-l on august 16, 2010.3 i noted dan’s retirement due to illness in this space in march.4 i first met dan in the spring of 2000, when he arrived at notre dame as the new associate director for information systems and digital access (i think the position was differently titled then) and, incidentally, my new boss. dan arrived only six weeks after my own start there. things at notre dame were unsettled at the time: the libraries had only the year before successfully implemented exlibris’ aleph500 ils, the first north american site to do so. while exlibris moved on to implementations at mcgill and the university of iowa, we at notre dame struggled with the challenges of supporting and upgrading a system then new to the north american market. it was not always easy or smooth, but throughout, dan always maintained an unflappable and collegial manner with exlibris staff and a quiet but supportive demeanor toward those of us who worked for him. i wish i could say that i understood and appreciated this better at the time, but i can’t. i still had some growing ahead of me—i’m sure that i still do. dan was there for me again as an enthusiastic reference when i moved on, first to the university of houston in 2003 and then to the university of alberta three years later. in these jobs i’d like to think i’ve come to understand a bit better the complex challenges faced by senior managers in large research libraries; in the process, i know i’ve come to appreciate dan’s quiet, knowledgeable, and hands-off style with department managers. it is one i’ve tried (not always successfully) to cultivate. while i was still at notre dame, dan invited me to join the editorial board of information technology and libraries, a group which over the years has come to include many “friends of dan,” including judith carter (quite possibly the world’s finest managing editor), andy boze (ital’s membership] ■■ “ital is the major benefit to me as we don’t have funds for me to attend lita meetings or training sessions.” ■■ “the paper journal is really the only membership benefit i use regularly.” ■■ “actually my answer is more, ‘i don’t know.’ i really question the value of my lita membership. ital is at least some tangible benefit i receive. quite honestly, i don’t know that there really are other benefits of lita membership.” question 12 asked about whether ital should continue with its current delayed open-access model (i.e., the latest two issues embargoed for non-lita members), or go completely open-access. by a three-to-two margin, readers favored moving to an open-access model for all issues. in the following question that asked whether respondents would continue or terminate lita membership were ital to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to lita membership, with the narrative comments again suggesting much the same underlying reasoning. in sum, the results suggest to me more satisfaction with ital than i might have anticipated; at the same time, i’ve only scratched the surface in my comments here. the narrative answers in particular—which i have touched on in only the most cursory fashion—have many things to say about ital’s “place,” suggestions for future articles, and a host of other worthy ideas. there is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. who, for instance, favors continuance of a paper ital, and who prefers electronic-only? but to come back to that reader’s comment about ital and “the space in between” that i used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ital has a substantial readership outside of library it—suggest that that “space in between” is precisely where ital should be. we may or may not occupy that space “awkwardly,” and there is always room for improvement, although i hope we do better than “flop around”! the results make clear that ital’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of code4lib journal (electronic-only, of course!) to the library administrator who satisfies her need for technology information by taking her paper copy of ital along when traveling. elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 information technology and libraries | december 2010 between membership and receiving the journal. many of them appear to infer that a portion of their lita dues, then, are earmarked for the publication and mailing of ital. sadly, this is not the case. in years past, ital’s income from advertising paid the bills and even generated additional revenue for lita coffers. today, the shoe is on the other foot because of declining advertising revenue, but ital is still expected to pay its own way, which it has failed to do in recent years. but to those who reasonably believe that some portion of their dues is dedicated to the support of ital, well, t’ain’t so. bothered by this? complain to the lita board. 2. as a point of comparison, consider the following results from the 2000 ital reader survey. respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ital should be published simultaneously as a print-onpaper journal and an electronic journal (n = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ital should be published in an electronic form only (n = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) in other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. karen starr, “fw: [libs-or] jim kopp: celebration of life,” online posting, aug. 16, 2010, lita-l, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed sept. 29, 2010). 4. marc truitt, “dan marmion,” information technology & libraries 29 (mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed sept. 29, 2010). webmaster), and mark dehmlow. while dan left ital in 2004, i think that he left the journal a wonderful and lasting legacy in these extremely capable and dedicated folks. my fondest memories of dan concern our shared passion for model trains. i remember visiting a train show in south bend with him a couple of times, and our last time together (at the ala midwinter meeting in denver two years ago) was capped by a snowy trek with exlibris’ carl grant, another model train enthusiast, to the mecca of model railroading, caboose hobbies. three boys off to see their toys—oh, exquisite bliss! i don’t know whether ital or its predecessor jola have ever reprinted an editorial, but while searching the archives to find something that would honor both jim and dan, i found a piece that i hope speaks eloquently of their contributions and to ital’s reason for being. dan’s editorial, “why is ital important?” originally published in our june 2002 issue, appears again immediately following this column. i think its message and the views expressed therein by jim and dan remain as valid today as they were in 2002. they also may help to frame my comments concerning our reader survey in the previous section. farewell, jim and dan. you will both be sorely missed. notes and references 1. a number of narrative answers to the survey make it clear that ital readers who are lita members perceive a link 76 information technology and libraries | june 2010 in this paper we discuss the design space of methods for integrating information from web services into websites. we focus primarily on client-side mash-ups, in which code running in the user’s browser contacts web services directly without the assistance of an intermediary server or proxy. to create such mash-ups, we advocate the use of “widgets,” which are easy-to-use, customizable html elements whose use does not require programming knowledge. although the techniques we discuss apply to any web-based information system, we specifically consider how an opac can become both the target of web services integration and also a web service that provides information to be integrated elsewhere. we describe three widget libraries we have developed, which provide access to four web services. these libraries have been deployed by us and others. our contributions are twofold: we give practitioners an insight into the trade-offs surrounding the appropriate choice of mash-up model, and we present the specific designs and use examples of three concrete widget libraries librarians can directly use or adapt. all software described in this paper is available under the lgpl open source license. ■■ background web-based information systems use a client-server architecture in which the server sends html markup to the user’s browser, which then renders this html and displays it to the user. along with html markup, a server may send javascript code that executes in the user’s browser. this javascript code can in turn contact the original server or additional servers and include information obtained from them into the rendered content while it is being displayed. this basic architecture allows for myriad possible design choices and combinations for mash-ups. each design choice has implications to ease of use, customizability, programming requirements, hosting requirements, scalability, latency, and availability. server-side mash-ups in a server-side mash-up design, shown in figure 1, the mash-up server contacts the base server and each source when it receives a request from a client. it combines the information received from the base server and the sources and sends the combined html to the client. server-side mash-up systems that combine base and mash-up servers are also referred to as data mash-up systems. such data mash-up systems typically provide a web-based configuration front-end that allows users to select data sources, specify the manner in which they are combined, and to create a layout for the entire mash-up. godmar back and annette bailey web services and widgets for library information systems as more libraries integrate information from web services to enhance their online public displays, techniques that facilitate this integration are needed. this paper presents a technique for such integration that is based on html widgets. we discuss three example systems (google book classes, tictoclookup, and majax) that implement this technique. these systems can be easily adapted without requiring programming experience or expensive hosting. t o improve the usefulness and quality of their online public access catalogs (opacs), more and more librarians include information from additional sources into their public displays.1 examples of such sources include web services that provide additional bibliographic information, social bookmarking and tagging information, book reviews, alternative sources for bibliographic items, table-of-contents previews, and excerpts. as new web services emerge, librarians quickly integrate them to enhance the quality of their opac displays. conversely, librarians are interested in opening the bibliographic, holdings, and circulation information contained in their opacs for inclusion into other web offerings they or others maintain. for example, by turning their opac into a web service, subject librarians can include up-to-the-minute circulation information in subject or resource guides. similarly, university instructors can use an opac’s metadata records to display citation information ready for import into citation management software on their course pages. the ability to easily create such “mash-up” pages is crucial for increasing the visibility and reach of the digital resources libraries provide. although the technology to use web services to create mash-ups is well known, several practical requirements must be met to facilitate its widespread use. first, any environment providing for such integration should be easy to use, even for librarians with limited programming background. this ease of use must extend to environments that include proprietary systems, such as vendor-provided opacs. second, integration must be seamless and customizable, allowing for local display preferences and flexible styling. third, the setup, hosting, and maintenance of any necessary infrastructure must be low-cost and should maximize the use of already available or freely accessible resources. fourth, performance must be acceptable, both in terms of latency and scalability.2 godmar back (gback@cs.vt.edu) is assistant professor, department of computer science and annette bailey (afbailey@vt.edu) is assistant professor, university libraries, virginia tech university, blacksburg. web services and widgets for library information systems | back and bailey 77 examples of such systems include dapper and yahoo! pipes.3 these systems require very little programming knowledge, but they limit mash-up creators to the functionality supported by a particular system and do not allow the user to leverage the layout and functionality of an existing base server, such as an existing opac. integrating server-side mash-up systems with proprietary opacs as the base server is difficult because the mash-up server must parse the opac’s output before integrating any additional information. moreover, users must now visit—or be redirected to—the url of the mash-up server. although some emerging extensible opac designs provide the ability to include information from external sources directly and easily, most currently deployed systems do not.4 in addition, those mash-up servers that do usually require server-side programming to retrieve and integrate the information coming from the mash-up sources into the page. the availability of software libraries and the use of special purpose markup languages may mitigate this requirement in the future. from a performance scalability point of view, the mash-up server is a bottleneck in server-side mash-ups and therefore must be made large enough to handle the expected load of end-user requests. on the other hand, the caching of data retrieved from mash-up sources is simple to implement in this arrangement because only the mash-up server contacts these sources. such caching reduces the frequency with which requests have to be sent to sources if their data is cacheable, that is, if realtime information is not required. the latency in this design is the sum of the time required for the client to send a request to the mashup server and receive a reply, plus the processing time required by the server, plus the time incurred by sending a request and receiving a reply from the last responding mash-up source. this model assumes that the mash-up server contacts all sources in parallel, or as soon as the server knows that information from a source should be included in a page. the availability of the system depends on the availability of all mash-up sources. if a mash-up source does not respond, the end user must wait until such failure is apparent to the mash-up server via a timeout. finally, because the mash-up server acts as a client to the base and source servers, no additional security considerations apply with respect to which sources may be contacted. there also are no restrictions on the data interchange format used by source servers as long as the mash-up server is able to parse the data returned. client-side mash-ups in a client-side setup, shown in figure 2, the base server sends only a partial website to the client, along with javascript code that instructs the client which other sources of information to contact. when executed in the browser, this javascript code retrieves the information from the mash-up sources directly and completes the mash-up. the primary appeal of client-side mashing is that no mash-up server is required, and thus the url that users visit does not change. consequently, the mash-up server is no longer a bottleneck. equally important, no maintenance is required for this server, which is particularly relevant when libraries use turnkey solutions that restrict administrative access to the machine housing their opac. on the other hand, without a mash-up server, results from mash-up sources can no longer be centrally cached. thus the mash-up sources themselves must be sufficiently figure 1. server-side mash-up construction figure 2. client-side mash-up construction 78 information technology and libraries | june 2010 scalable to handle the expected number of requests. as a load-reducing strategy, mash-up sources can label their results with appropriate expiration times to influence the caching of results in the clients’ browsers. availability is increased because the mash-up degrades gracefully if some of the mash-up sources fail, since the information from the remaining sources can still be displayed to the user. assuming that requests are sent by the client in parallel or as soon as possible, and assuming that each mash-up source responds with similar latency to requests sent by the user’s browser as to requests sent by a mash-up server, the latency for a client-side mash-up is similar to the server-side mash-up. however, unlike in the server-side approach, the page designer has the option to display partial results to the user while some requests are still in progress, or even to delay sending some requests until the user explicitly requests the data by clicking on a link or other element on the page. because client-side mash-ups rely on javascript code to contact web services directly, they are subject to a number of restrictions that stem from the security model governing the execution of javascript code in current browsers. this security model is designed to protect the user from malicious websites that could exploit client-side code and abuse the user’s credentials to retrieve html or xml data from other websites to which a user has access. such malicious code could then relay this potentially sensitive data back to the malicious site. to prevent such attacks, the security model allows the retrieval of html text or xml data only from sites within the same domain as the origin site, a policy commonly known as sameorigin policy. in figure 2, sources a and b come from the same domain as the page the user visits. the restrictions of the same-origin policy can be avoided by using the javascript object notation (json) interchange format.5 because client-side code may retrieve and execute javascript code served from any domain, web services that are not co-located with the origin site can make their results available using json. doing so facilitates their inclusion into any page, independent of the domain from which it is served (see source c in figure 2). many existing web services already provide an option to return data in json format, perhaps along with other formats such as xml. for web services that do not, a proxy server may be required to translate the data coming from the service into json. if the implementation of a proxy server is not feasible, the web service is usable only on pages within the same domain as the website using it. client-side mash-ups lend themselves naturally to enhancing the functionality of existing, proprietary opac systems, particularly when a vendor provides only limited extensibility. because they do not require server-side programming, the absence of a suitable vendor-provided server-side programming interface does not prevent their creation. oftentimes, vendor-provided templates or variables can be suitably adapted to send the necessary html markup and javascript code to the client. the amount of javascript code a librarian needs to write (or copy from a provided example) determines both the likelihood of adoption and the maintainability of a given mash-up creation. the less javascript code there is to write, the larger the group of librarians who feel comfortable trying and adopting a given implementation. the approach of using html widgets hides the use of javascript almost entirely from the mash-up creator. html widgets represent specially composed markup, which will be replaced with information coming from a mash-up source when the page is rendered. because the necessary code is contained in a javascript library, adapters do not need to understand programming to use the information coming from the web service. finally, html widgets are also preferable for javascript-savvy users because they create a layer of abstraction over the complexity and browser dependencies inherent in javascript programming. ■■ the google book classes widget library to illustrate our approach, we present a first example that allows the integration of data obtained from google book search into any website, including opac pages. google book search provides access to google’s database of book metadata and contents. because of the company’s book scanning activities as well as through agreements with publishers, google hosts scanned images of many book jackets as well as partial or even full previews for some books. many libraries are interested in either using the book jackets when displaying opac records or alerting their users if google can provide a partial or full view of an item a user selected in their catalog, or both.6 this service can help users decide whether to borrow the book from the library. the google book search dynamic link api the google book search dynamic link api is a jsonbased web service through which google provides certain metadata for items it has indexed. it can be queried using bibliographic identifiers such as isbn, oclc number, or library of congress control number (lccn). it returns a small set of data that includes the url of a book jacket thumbnail image, the url of a page with bibliographic information, the url of a preview page (if available), as well as information about the extent of any preview and whether the preview viewer can be embedded directly into other pages. table 1 shows the json result returned for an example isbn. web services and widgets for library information systems | back and bailey 79 widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. from the adapter’s perspective, the use of these widgets is extremely simple. the adapter places html or

tags into the page where they want data from google book search to display. these tags contain an html attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. it may contain its isbn, oclc number, or lccn. in addition, the tags also contain one or more html <class> attributes to describe which processing should be done with the information retrieved from google to integrate it into the page. these classes can be combined with a list of traditional css classes in the <class> attribute to apply further style and formatting control. examples as an example, consider the following html an adapter may use in a page: <span title=“isbn:0596000278” class=“gbs -thumbnail gbs-link-to-preview”></span> when processed by the google book classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for isbn 0596000278, and “gbs-link-to-preview” provides instructions to wrap the <span> tag in a hyperlink pointing to google’s preview page. the result is as if the server had contacted google’s web service and constructed the html shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting google’s service and making the necessary manipulations to the document. example 2 in table 2 demonstrates a second possible use of the widget. in this example, the creator’s intent is to display an image that links to google’s information page if and only if google provides at least a partial preview for the book in question. this goal is accomplished by placing the image inside the span and using style=“display:none” to make the span initially invisible. the span is made visible only if a preview is available at google, displaying the hyperlinked image. the full list of features supported by the google book classes widget library can be found in table 3. integration with legacy opacs the approach described thus far assumes that the mashup creator has sufficient control over the html markup that is sent to the user. this assumption does not always hold if the html is produced by a vendor-provided system, since such systems automatically generate most of the html used to display opac search results or individual bibliographic records. if the opac provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary html by utilizing variables (e.g., “@#isbn@” for isbn numbers) set by the opac software. if no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. we implemented such accommodations to facilitate the use of google book classes within a iii millennium opac.7 we used magic strings such as “isbn:millennium.record” in a table 1. sample request and response for google book search dynamic link api request: http://books.google.com/books?bibkeys=isbn:0596000278&jscmd=viewapi&callback=process json response: process({ “isbn:0596000278”: { “bib_key”: “isbn:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26source=gbs_viewapi”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 source=gbs_viewapi”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a”, “preview”: “partial”, “embeddable”: true } }); 80 information technology and libraries | june 2010 table 2. example of client-side processing by the google book classes widget library example 1: html written by adapter browser display <span title=“isbn:0596000278” class=“gbs-thumbnail gbs-link-to-preview”> </span> resultant html after client-side processing <a href=“http://books.google.com/books?id=ezqe1hh91q4c& printsec=frontcover&source=gbs_viewapi”> <span title=“” class=”gbs-thumbnail gbs-link-to-preview”> <img src=“http://bks3.books.google.com/books?id=ezqe1hh91q4c& amp;printsec=frontcover&img=1&zoom=5& sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a” /> </span> </a> example 2: html written by adapter browser display <span style=“display: none” title=“isbn:0596000278” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> resultant html after client-side processing <a href=”http://books.google.com/books?id=ezqe1hh91q4c& source=gbs_viewapi”> <span title=“” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> </a> table 3. supported google book classes google book class meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure include an <img...> embedding the thumbnail image wrap span/div in link to preview at google book search (gbs) wrap span/div in link to info page at gbs wrap span/div in link to thumbnail at gbs directly embed a viewer for book’s content into the page, if possible keep this span/div only if gbs reports that book’s viewability is “noview” keep this span/div only if gbs reports that book’s viewability is at least “partial” keep this span/div only if gbs reports that book’s viewability is “partial” keep this span/div only if gbs reports that book’s viewability is “full” remove this span/div if gbs doesn’t return book information for this item <title> attribute to instruct the widget library to harvest the isbn from the current page via screen scraping. figure 3 provides an example of how a google book classes widget can be integrated into an opac search results page. ■■ the tictoclookup widget library the tictocs journal table of contents service is a free online service that allows academic researchers and web services and widgets for library information systems | back and bailey 81 other users to keep up with newly published research by giving them access to thousands of journal tables of contents from multiple publishers.8 the tictocs consortium compiles and maintains a dataset that maps issns and journal titles to rss-feed urls for the journals’ tables of contents. the tictoclookup web service we used the tictocs dataset to create a simple json web service called “tictoclookup” that returns rss-feed urls when queried by issn and, optionally, by journal title. table 4 shows an example query and response. to accommodate different hosting scenarios, we created two implementations of this tictoclookup: a standalone and a cloud-based implementation. the standalone version is implemented as a python web application conformant to the web services gateway interface (wsgi) specification. hosting this version requires access to a web server that supports a wsgicompatible environment, such as apache’s mod_wsgi. the python application reads the tictocs dataset and responds to lookup requests for specific issns. a cron job downloads the most up-to-date version of the dataset periodically. the cloud version of the tictoclookup service is implemented as a google app engine (gae) application. it uses the highly scalable and highly available gae datastore to store tictocs data records. gae applications run on servers located in google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. as of june 2009, google hosting of gae applications is free, which includes a free allotment of several computational resources. for each application, gae allows quotas of up to 1.3 mb requests and the use of up to 10 gb of bandwidth per twenty-fourhour period. although this capacity is sufficient for the purposes of many small and medium-size institutions, additional capacity can be purchased at a small cost. widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. like google book classes, this widget library is controlled via html attributes associated with html <span> or <div> tags that are placed into the page where the user decides to display data from the tictoclookup service. the html <title> attribute identifies the journal by its issn or its issn and title. as with google book classes, figure 3. sample use of google book classes in an opac results page table 4. sample request and response for tictocs lookup web service request: http://tictoclookup.appspot.com/0028-0836?title=nature&jsoncallback=process json response: process({ “lastmod”: “wed apr 29 05:42:36 2009”, “records”: [{ “title”: “nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 information technology and libraries | june 2010 the html <class> attribute describes the desired processing, which may contain traditional css classes. example consider the following html an adapter may use in a page: <span style=“display:none” class=“tictoc-link tictoc-preview tictoc-alternate-link” title=“issn:00280836: nature”> click to subscribe to table of contents for this journal </span> when processed by the tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the rss feed at which the table of content is published, allowing users to subscribe to it. the class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. we use the google feeds api, another json-based web service, to retrieve a cached copy of the feed. the “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the rss feed icon figure 4. sample use of tictoclookup classes in the status bar. the <span> element, which is initially invisible, is made visible if and only if the tictoclookup service returns information for the given pair of issn and title. figure 4 provides a screenshot of the display if the user hovers over the link. as with google book classes, the mash-up creator does not need to be concerned with the mechanics of contacting the tictoclookup web service and making the necessary manipulations to the document. table 5 provides a complete overview of the classes tictoclookup supports. integration with legacy opacs similar to the google book classes widget library, we implemented provisions that allow the use of tictoclookup classes on pages over which the mash-up creator has limited control. for instance, specifying a title attribute of “issn:millennium.issnandtitle” harvests the issn and journal title from the iii millennium’s record display page. ■■ majax whereas the widget libraries discussed thus far integrate external web services into an opac display, majax is a widget library that integrates information coming from an opac into other pages, such as resource guides or course displays. majax is designed for use with a iii millennium integrated library system (ils) whose vendor does not provide a web-services interface. the techniques we used, however, extend to other opacs as well. like many table 5. supported tictoclookup classes tictoclookup class meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title wrap span/div in link to table of contents display tooltip with preview of current entries embed preview of first n entries insert <link rel=“alternate”> into document append the title of the journal to the span/div web services and widgets for library information systems | back and bailey 83 legacy opacs, millennium does not only lack a web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the opac. providing opac data as a web service we implemented two methods to access records from the millennium opac using bibliographic identifiers such as isbn, oclc number, bibliographic record number, and item title. both methods provide access to complete marc records and holdings information, along with locations and real-time availability for each held item. majax extracts this information via screenscraping from the marc record display page. as with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the opac changes. in our experience, such changes occur at a frequency of less than once per year. the first method, majax 1, implements screen scraping using javascript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. this document is included in the target page as a hidden html <iframe> element (see frame b in figure 2). consequently, the same-domain restriction applies to the code residing in it. majax 1 can thus be used only on pages within the same domain—for instance, if the opac is housed at opac.library.university.edu, majax 1 may be used on all pages within *.university.edu (not merely *.library.university.edu). the key advantage of majax 1 is that no additional server is required. the second method, majax 2, uses an intermediary server that retrieves the data from the opac, translates it to json, and returns it to the client. this method, shown in figure 5, returns json data and therefore does not suffer from the same-domain restriction. however, it requires hosting the majax 2 web service. like the tictoclookup web service, we implemented the majax 2 web service using python conformant to wsgi. a single installation can support multiple opacs. widgetization the majax widget library allows the integration of both majax 1 and majax 2 data into websites without javascript programming. the <span> tags function as placeholders, and <title> and <class> attributes describe the desired processing. majax provides a number of “majax classes,” multiple of which can be specified. these classes allow a mash-up creator to insert a large variety of bibliographic information, such as the values of marc fields. classes are also provided to insert fully formatted, ready-to-copy bibliographic references in harvard style, live circulation information, links to the catalog record, links to online versions of the item (if applicable), a ready-to-import ris description of the item, and even images of the book cover. a list of classes majax supports is provided in table 6. examples figure 6 provides an example use of majax widgets. four <span> tags expand into the book cover, a complete harvard-style reference, the valid of a specific marc field (020), and a display of the current availability of the item, wrapped in a link to the catalog record. texts such as “copy is available” shown in figure 6 are localizable. even though there are multiple majax <span> tags that refer to the same isbn, the majax widget library will contact the majax 1 or majax 2 web service only once per identifier, independent of how often it is used in a page. to manage the load, the majax client site library can be configured to not exceed a maximum number of requests per second, per client. all software described in this paper is available under the lgpl open source license. the majax libraries have been used by us and others for about two years. for instance, the “new books” list in our library uses majax 1 to provide circulation information. faculty members at our institution are using majax to enrich their course websites. a number of libraries have adopted majax 1, which is particularly easy to host because no additional server is required. ■■ related work most ilss in use today do not provide suitable web-services interfaces to access either bibliographic information figure 5. architecture of the majax 2 web service 84 information technology and libraries | june 2010 or availability data.9 this shortcoming is addressed by multiple initiatives. the ils discovery interface task force (ils-di) created a set of recommendations that facilitate the integration of discovery interfaces with legacy ilss, but does not define a concrete api.10 related, the iso 20775 holdings standard describes an xml schema to describe the availability of items across systems, but does not describe an api for accessing them.11 many ilss provide a z39.50 interface in addition to their htmlbased web opacs, but z39.50 does not provide standardized holdings and availability.12 nevertheless, there is hope within the community that ils vendors will react to their customers’ needs and provide web-services interfaces that implement these recommendations. the jangle project provides an api and an implementation of the ils-di recommendations through a representations state transfer (rest)–based interface that uses the atom publishing protocol (app).13 jangle can be linked to legacy ilss via connectors. the use of the xml-based app prevents direct access from client-side javascript code, however. in the future, adoption and widespread implementation of the w3c working draft on crossorigin resource sharing may relax the same-origin restriction in a controlled fashion, and thus allow access to app feeds from javascript across domains.14 screen-scraping is a common technique used to overcome the lack of web-services interfaces. for instance, oclc’s worldcat local product obtains access to availability information from legacy ilss in a similar fashion as our majax 2 service.15 whereas the web services used or created in our work exclusively use a rest-based model and return data in json format, interfaces based on soap (formerly simple object access protocol) whose semantics are described by a wsdl specification provide an alternative if access from within client-side javascript code is not required.16 html written by adapter <table width=“340”><tr><td> <span class=“majax-syndetics-vtech” title=“i1843341662”></span> </td><td> <span class=“majax-harvard-reference” title=“i1843341662”></span> <br /> isbn: <span class=“majax-marc-020” title=“i1843341662”></span> <br /> <span class=“majax-linktocatalogmajax-showholdings” title=“i1843341662”></span> </td></tr></table> display in browser after processing dahl, mark., banerjee, kyle., spalti, michael., 2006, digital libraries : integrating content and systems / oxford, chandos publishing, xviii, 203 p. isbn: 1843341662 (hbk.) 1 copy is available figure 6. example use of majax widgets oclc grid services provides rest-based web-services interfaces to several databases, including the worldcat search api and identifier services such as xisbn, xissn, and xoclcnum for frbr-related metadata.17 these services support xml and json and could benefit from widgetization for easier inclusion into client pages. the use of html markup to encode processing instructions is common in javascript frameworks, such as yui or dojo, which use <div> elements with customdefined attributes (so-called expando attributes) for this purpose.18 google gadgets uses a similar technique as well.19 the widely used context objects in spans (coins) specification exploits <span> tags to encode openurl table 6. selected majax classes majax class replacement majax-marc-fff-s majax-marc-fff majax-syndetics-* majax-showholdings majax-showholdings-brief majax-endnote majax-ebook majax-linktocatalog majax-harvard-reference majax-newline majax-space marc field fff, subfields concatenation of all subfields in field fff book cover image current holdings and availability information …in brief format ris version of record link to online version, if any link to record in catalog reference in harvard style newline space web services and widgets for library information systems | back and bailey 85 techniques for the seamless inclusion of information from web services into websites. we considered the cases where an opac is either the target of such integration or the source of the information being integrated. we focused on client-side techniques in which each user’s browser contacts web services directly because this approach lends itself to the creation of html widgets. these widgets allow the integration and customization of web services without requiring programming. therefore nonprogrammers can become mash-up creators. we described in detail the functionality and use of several widget libraries and web services we built. table 7 provides a summary of the functionality and hosting requirements for each system discussed. although the specific requirements for each system differ because of their respective nature, all systems are designed to be deployable with minimum effort and resource requirements. this low entry cost, combined with the provision of a high-level, nonprogramming interface, constitute two crucial preconditions for the broad adoption of mash-up techniques in libraries, which in turn has the potential to context objects in pages for processing by client-side extension.20 librarything uses client-side mash-up techniques to incorporate a social tagging service into opac pages.21 although their technique uses a <div> element as a placeholder, it does not allow customization via classes—the changes to the content are encoded in custom-generated javascript code for each library that subscribes to the service. the juice project shares our goal of simplifying the enrichment of opac pages with content from other sources.22 it provides a set of reusable components that is directed at javascript programmers, not librarians. in the computer-science community, multiple emerging projects investigate how to simplify the creation of server-side data mash-ups by end user programmers.23 ■■ conclusion this paper explored the design space of mash-up table 7. summary of features and requirements for the widget libraries presented in this paper majax 1 majax 2 google book classes tictoclookup classes web service screen scraping iii record display json proxy for iii record display google book search dynamic link api books.google.com tictoc cloud application tictoclookup .appspot.com hosted by existing millennium installation /screens wsgi/python script on libx.lib.vt.edu google, inc. google, inc. via google app engine data provenance your opac your opac google jisc (www.tictocs .ac.uk) additional cost n/a can use libx.lib.vt.edu for testing, must run wsgi-enabled web server in production free, but subject to google terms of service generous free quota, pay per use beyond that same domain restriction yes no no no widgetization majax.js: class-based: majaxclasses gbsclasses.js:classbased: gbs tictoc.js:class-based: tictoc requires javascript programming no no no no requires additional server no yes (apache+mod_wsgi) no no (if using gae), else need apache+mod_wsgi iii bibrecord display n/a n/a yes yes iii webbridge integration yes yes yes yes 86 information technology and libraries | june 2010 vastly increase the reach and visibility of their electronic resources in the wider community. references 1. nicole engard, ed., library mashups—exploring new ways to deliver library data (medford, n.j.: information today, 2009); andrew darby and ron gilmour, “adding delicious data to your library website,” information technology & libraries 28, no. 2 (2009): 100–103. 2. monica brown-sica, “playing tag in the dark: diagnosing slowness in library response time,” information technologies & libraries 27, no. 4 (2008): 29–32. 3. dapper, “dapper dynamic ads,” http://www.dapper .net/ (accessed june 19, 2009); yahoo!, “pipes,” http://pipes .yahoo.com/pipes/ (accessed june 19, 2009). 4. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19; john blyberg, “ils customer bill-of-rights,” online posting, blyberg.net, nov. 20, 2005, http://www.blyberg .net/2005/11/20/ils-customer-bill-of-rights/ (accessed june 18, 2009). 5. douglas crockford, “the application/json media type for javascript object notation (json),” memo, the internet society, july 2006, http://www.ietf.org/rfc/rfc4627.txt (accessed mar. 30, 2010). 6. google, “who’s using the book search apis?” http:// code.google.com/apis/books/casestudies/ (accessed june 16, 2009). 7. innovative interfaces, “millennium ils,” http://www.iii .com/products/millennium_ils.shtml (accessed june 19, 2009). 8. joint information systems committee, “tictocs journal tables of contents service,” http://www.tictocs.ac.uk/ (accessed june 18, 2009). 9. mark dahl, kyle banarjee, and michael spalti, digital libraries: integrating content and systems (oxford, united kingdom: chandos, 2006). 10. john ockerbloom et al., “dlf ils discovery interface task group (ils-di) technical recommendation,” (dec. 8, 2008), http://diglib.org/architectures/ilsdi/dlf_ils_ discovery_1.1.pdf (accessed june 18, 2009). 11. international organization for standardization, “information and documentation—schema for holdings information,” http://www.iso.org/iso/catalogue_detail .htm?csnumber=39735 (accessed june 18, 2009) 12. national information standards organization, “ansi/ niso z39.50—information retrieval: application service definition and protocol specification,” (bethesda, md.: niso pr., 2003), http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed may 31, 2010). 13. ross singer and james farrugia, “unveiling jangle: untangling library resources and exposing them through the atom publishing protocol,” the code4lib journal no. 4 (sept. 22, 2008), http://journal.code4lib.org/articles/109 (accessed apr. 21, 2010); roy fielding, “architectural styles and the design of network-based software architectures” (phd diss., university of california, irvine, 2000); j. c. gregorio, ed., “the atom publishing protocol,” memo, the internet engineering task force, oct. 2007, http://bitworking.org/projects/atom/rfc5023.html (accessed june 18, 2009). 14. world wide web consortium, “cross-origin resource sharing: w3c working draft 17 march 2009,” http://www .w3.org/tr/access-control/ (accessed june 18, 2009). 15. oclc online computer library center, “worldcat and cataloging documentation,” http://www.oclc.org/support/ documentation/worldcat/default.htm (accessed june 18, 2009). 16. f. curbera et al., “unraveling the web services web: an introduction to soap, wsdl, and uddi,” ieee internet computing 6, no. 2 (2002): 86–93. 17. oclc online computer library center, “oclc web services,” http://www.worldcat.org/devnet/wiki/services (accessed june 18, 2009); international federation of library associations and institutions study group on the functional requirements for bibliographic records, “functional requirements for bibliographic records : final report,” http://www.ifla.org/files/ cataloguing/frbr/frbr_2008.pdf (accessed mar. 31, 2010). 18. yahoo!, “the yahoo! user interface library (yui),” http://developer.yahoo.com/yui/ (accessed june 18, 2009); dojo foundation, “dojo—the javascript toolkit,” http://www .dojotoolkit.org/ (accessed june 18, 2009). 19. google, “gadgets.* api developer’s guide,” http://code. google.com/apis/gadgets/docs/dev_guide.html (accessed june 18, 2009). 20. daniel chudnov, “coins for the link trail,” library journal 131 (2006): 8–10. 21. librarything, “librarything,” http://www.librarything .com/widget.php (accessed june 19, 2009). 22. robert wallis, “juice—javascript user interface componentised extensions,” http://code.google.com/p/juice-project/ (accessed june 18, 2009). 23. jeffrey wong and jason hong, “making mashups with marmite: towards end-user programming for the web” conference on human factors in computing systems, san jose, california, april 28–may 3, 2007: conference proceedings, volume 2 (new york: association for computing machinery, 2007): 1435–44; guiling wang, shaohua yang, and yanbo han, “mashroom: end-user mashup programming using nested tables” (paper presented at the international world wide web conference, madrid, spain, 2009): 861–70; nan zang, “mashups for the web-active user” (paper presented at the ieee symposium on visual languages and human-centric computing, herrshing am ammersee, germany, 2008): 276–77. j costs of library catalog cards produced by computer 121 frederick g. kilgour: ohio college library center, columbus, ohio production costs of 79,831 cards are analyzed. cards were produced by four variants of the columbia-harvard-yale procedure employing an ibm 870 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. . early in september, 1964, the yale medical library.put into routine operation the columbia-harvard-yale computerized technique for catalog card manufacture ( 1), and during the following three · years yale produced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographic information retrieval system. however, the route selected for attaining the objective included manufacture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively accumulated for computerized subject retrieval. catalog cards were only one product of the system, but their production was designed to be as efficient as possible within constraints of the system. nevertheless, this paper will examine chy card production costs as though this segment of the system were an isolated procedure, yielding but one product, as is the case in classical library procedures. costing will disregard other benefits, such as accession lists and machine readable data produced for little, or no, additional expense. the columbia medical library and harvard medical library also installed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co122 journal of library automation vol. 1/ 2 june, 1968 lumbia produced its acquisitions lists until october, 1966, using chy techniques. harvard issued a similar list, but for a shorter period of time, and it was harvard's withdrawal early in 1966 that brought about the collapse of the project. nevertheless, other institutions adopted the chy procedure for catalog card production, among them the medical library at the university of rochester, which used the programs for two years following february, 1966. e. r. squibb & sons at east brunswick, new jersey, also uses the programs. at the university of kentucky an 870 document writer types catalog cards, but new programs were written to run on an ibm 7040 computer that recently have been recoded in cobol for an ibm 360/50. similarly, the library at philip morris, inc., richmond, virginia, rewrote the programs to run on an ibm 1620 computer which punches cards that drive an 870. the korean social science bibliography project of the human relations area files has elaborated the chy technique into its automated bibliographic system ( 2), which in turn is the base for another bibliographic system for mrican studies. the machine readable cataloging record of the chy mechanized system eventually became the great-grandfather of the marc ii format and contributed about as much to marc ii as would have been the case had their relationship been truly biological. although the columbia-harvard-yale project never did develop and activate its proposed bibliographic information retrieval system, r. k. summit working entirely independently has brought into successful operation his excellent dialog system ( 3) which is essentially the system that chy had in design stage. moreover, summit's system is definitely superior because it has several useful functions not contemplated in chy. nearly all reports on catalog card production limit study of costs to reproduction of cards and neglect other costs involved in preparing cards for the catalog. an exception is p. j. fasana's 1963 investigation wherein he found that library of congress cards, in seven copies and ready to be filed into a catalog, cost 16.6 cents per card; cards produced by a machine method consisting of a tape typewriter and a very small special purpose computer cost 9.9 cents ( 4). fasana used an hourly salary rate of $2.00. a study of early experience with chy production yielded 12.5 cents per card ( 1) whereas the present study shows that costs range between 8.8 and 9.8 cents per card, cards being ·in completed form, arranged in packs for individual catalogs, and ready for bursting before alphabetizing for filing. methods · during the course of the three years in which the chy programs were in operation, four variant techniques were used for card production. the first three with their limitations have been described · elsewhere ( 5). briefly, the initial system consisted of keypunching from worksheets, _listing the punch cards on an ibm 870 document writer, proofreading and costs of library catalog cards/ kilgour 123 correcting, processing the proofread and corrected punch cards on an ibm 1401 computer which produced punch card output that, in tum, was used to drive the 870 document writer for production of catalog cards on oneup forms. in the next arrangement, printing of cards on one-up forms was accomplished on an ibm 1401 computer driving an upperand lowercase print chain. in the third procedure, a two-up card form replaced the one-up form. finally, the medical library returned the 870 document writer to the manufacturer, and the 1401 was programmed to do the prooflisting in upper and lower case. the yale bibliographic system (6) replaced the chy routines on 25 july 1967. the keypuncher kept time records for the various activities listed in table 1 throughout the period of this study. during the first two months of operation, design for recording data was inadequate. subsequently an individual would, albeit infrequently, fail to record time elapsed, so that production of 7,630 cards was omitted from the study, leaving a total of 79,831 to be included. on several occasions during the fourth part of the study, the second proofreading was suspended, and only correction carried out. hence, time expended in this category is less than in the previous three periods. at first an ibm 1401 computer in the yale computer center was used, the center being located about a mile from the medical library. subsequently, another 1401 modified to drive an upperand lower-case print chain and located in the medical sc;hool was employed. later this machine was transferred to the administrative data systems computer center, which moved to a new location not long after it assumed operation of the 1401. still later, the 1401 was again transferred, this time to the yale computer center. as can be seen from the computer charges in table 1, these wanderings about new haven appear to have had no effect on operating efficiency. time recorded for each computer run was actual time clocked by the operator. other times were recorded by the individual performing the operation. ·. salaries used in the cost calculation were salaries being paid in june, 1967, which were, of course, appreciably higher than those in the autumn of 1964; hourly rate for the first proofreader in table 1 was $2.62 ~nd for the second $2.21. hourly rental for the 870 document writer was $.78. rate of computer charges employed in the calculation was $20 per hour, a rate that had existed during the last year or so during which data was collected. initially, computer charges had been $75 an hour, but they dropped precipitously during the first two years. costs for catalog card stock were the lowest cost charged for the two types of forms. since these forms were not standard items during the years of the study, their prices varied considerably depending upon the amount ordered. results table 1 contains cost figures for catalog card production by the four variant techniques. since salaries and computer charges can vary widely, -----.-.---.-~..::::-·...:::::-.-__ ...... l'o ~ table 1. per-card costs of computer-produced catalog cards. 'o' one-u p form on 870 one-up fo r m o n 1401, t woup f o r m on 1401 , two-up· form o n 1401 , ~ g proo f on 8 70 proof o n 870 p r oof o n 140 1 ...... ..a dollars hou r s dollars hou r s dolla r s hou r s d olla r s hours t"'' .... <:3"' k e ypunch i n g • 02 19 • 0099 • 0 2 18 • 0099 • 0222 • 0 10 1 • 0 235 • 0106 "'t ~ "'t '-!::: keypun c h • 0029 • 00 99 • 0030 • 009 9 • 003 0 • 0101 • 0 0 32 • 0 106 ::> ~ i b m 8 70p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 i bm 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... proofr eaders (2) 0 ;:$ proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ i bm 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... ib m 8 70-ca r d typing • 0 104 '-.... l'o card st o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 t o ta l • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 n um b er of cards 1 5, 149 9343 27,210 28, 129 0:> 00 number of titles 1, 6 55 990 2 , 920 3,1 30 cards per titl e 9 . 2 9. 4 9. 3 9 . 0 ~--· costs of librm·y catalog cards/kilgour 125 particularly among countries, time per card produced is also included in the table to facilitate comparison with other systems. of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as henry voos' helpful study (7) . however, two different methods of comparing the input costs in table 1 with those johnson ( 8) published for the stanford book catalog gave divergences of only 2 and 6 per cent. source of the increase in costs of six-tenths of a cent from the first procedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. when the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. costs rose again in procedure four. here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 document writer had been returned to the manufacturer. although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. coding costs were examined between 9 march and 11 may 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. if total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in table 1 could rise oneto three-tenths cents. discussion the accurate comparison of costs would be with those of systems similar to the chy system that produce more than one product. for instance, the chy system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. the accession list was produced mechanically at a cost far less than that for the previous manual preparation. the decklets also constituted machine readable information available for other purposes, most of which have not yet been realized. system costing would assign only a portion of keypunching and proofreading costs to card production. another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. in procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. however, in procedure four, the computer, which was then a mile distant from the medical library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li ii ii '· ,, .. '· ,, ' • ,, 126 journal of library automation vol. 1/ 2 june, 1968 even though other benefits are not reflected in comparative costs, it is clear from fasana's findings that the chy computer-produced cards cost far less than do lc cards, and have a similar cost to those produced mechanically on which fasana reported. although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photoreproduced cards may be less expensive than those described in this article. however, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the chy cards are one of several system products. increase in cost between procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 document writer. this cost increase was not detected until calculations were done for this investigation, and therein lies a moral. it was the policy at the yale library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had prepared programs for library operations. the single exception to this policy was the proof program, which this investigation reveals used an exhorbitant amount of time-one-third of that required for subsequent card production. since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of recognized ability was employed to write and code the program. because the program was simple, and because the programmer had high competence, efficiency of the program was never checked as it should have been. this episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? there is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are unfortunately far too few. comparison with data such as that in table 1 will also be helpful, but not definitive, in evaluating new programs. of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly reduced. concl"qsion computer-produced catalog cards, even when they are but one of several system products, can be prepared in finished form for a local catalog less expensively and with less delay than can library of congress printed cards. computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing catalog cards. however, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. costs of library catalog cards/ kilgour 127 acknowledgements the author is most grateful to mrs. sarah boyd, keypuncher extraordinary, who maintained the record of the data used in this study. national science foundation grant no. 179 supported the chy project in part. references 1. kilgour, frederick g.: "mechanization of cataloging procedures," bulletin of the medical library association, 53 (aprill965), 152-162. 2. koh, hesung c.: "a social science bibliographic system; computer adaptations," the american behavioral scientist, 10 (jan. 1967), 2-5. 3. summit, roger k.: "dialog; an operational on-line reference retrieval system," association for computing machinery, proceedings of 22nd national conference, (1967), 51-56. 4. fasana, p.j.: "automating cataloging functions in conventional libraries," library resources & technical services, 7 ( fall1963), 350-365. 5. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 6. weisbrod, david l.: "an integrated, computerized, bibliographic system for libraries," (in press). 7. voos, henry: standard times for certain clerical activities in technical processing (ann arbor, university microfilms, 1965). 8. johnson, richard d.: "a book catalog at stanford~" journal of library automation, 1 (march 1968), 13-50. ----------------------mobile technologies & academics: do students use mobile technologies in their academic lives and are librarians ready to meet this challenge? angela dresselhaus and flora shrode mobile technologies & academics | dresselhaus and shrode 82 abstract in this paper we report on two surveys and offer an introductory plan that librarians may use to begin implementing mobile access to selected library databases and services. results from the first survey helped us to gain insight into where students at utah state university (usu) in logan, utah, stand regarding their use of mobile devices for academic activities in general and their desire for access to library services and resources in particular. a second survey, conducted with librarians, gave us an idea of the extent to which responding libraries offer mobile access, their future plans for mobile implementation, and their opinions about whether and how mobile technologies may be useful to library patrons. in the last segment of the paper, we outline steps librarians can take as they “go mobile.” purpose of the study similar to colleagues in all types of libraries around the world, librarians at utah state university (usu) want to take advantage of opportunities to provide information resources and library services via mobile devices. observing growing popularity of mobile, internetcapable telephones and computing devices, usu librarians assume that at least some users would welcome the ability to use such devices to connect to library resources. to find out what mobile services or vendors’ applications usu students would be likely to use, we conducted a needs assessment. the lessons learned will provide important guidance to management decisions about how librarians and staff members devote time and effort toward implementing and developing mobile access. we conducted a survey of usu’s students (approximately 25,000 undergraduates and graduates) to determine the degree of handheld device usage in the student population, the purposes for which students use such devices, and students’ interests in mobile access to the library. in addition, we surveyed librarians to learn about libraries’ current and future plans to launch mobile services. this survey was administered to an opportunistic population angela dresselhaus (aldresselhaus@gmail.com) was electronic resources librarian, flora shrode (flora.shrode@usu.edu) is head, reference & instruction services, utah state university, logan, utah. mailto:aldresselhaus@gmail.com mailto:flora.shrode@usu.edu information technology and libraries | june 2012 83 comprised of subscribers to seven e-mail lists whom we invited to offer feedback. our goal was to develop an action plan that would be responsive to students’ interests. at the same time, we aim to take advantage of the growing awareness of and demand for mobile access and to balance workloads among the library information technology professionals who would implement these services. usu is utah’s land-grant university and the merrill-cazier library is its primary library facility on the home campus in logan, utah. while usu has had satellite branches for some time, a growing emphasis on expanding online and distance education courses and degree programs has resulted in a considerable growth of its distance education programs in the last five years. mobile access to university resources makes especially good sense for the distance education population and for students who may reside close to the main usu campus but who also enroll in online courses. the library has an information technology staff of 4.5 fte professionals who support the library catalog, maintain roughly 250 computer workstations in cooperation with the director of campus student computer labs, and oversee the computing needs of library staff and faculty members. literature review mobile access to library resources is not a new concept; in fact, the first project designed to deliver handheld mobile access to library patrons began eighteen years ago, in 1993, the time of mainframe computers and gopher. the “library without a roof” project partners included the university of southern alabama, at&t, bellsouth cellular, and notable technologies, inc. 1 library patrons at participating institutions could search and read electronic texts on their personal digital assistants (pdas) and search the library catalog while browsing in physical collections. as reflected in the literature, interest in pda applications for libraries started to pick up around the turn of the twenty-first century. medical librarians were among the first to widely recognize the potential impact of mobile technologies on librarianship. a 2002 article in the journal of the medical library association and a monograph by colleen cuddy are among the first publications that focus on pdas. 2 a quick perusal of the medical category on the itunes store reveals several professional applications, ranging from new england journal of medicine tools to remote patient vital-sign monitors. as an example of the depth of mobile-device penetration in the medical field, in 2010 the food and drug administration approved the marketing of the airstrip suite of mobile-device applications. these apps work in conjunction with vital-sign monitoring equipment to allow instant remote access to a patient’s vital signs. 3 these examples illustrate the increasing pervasiveness of mobile technology in everyday life. mobile learning in academic areas outside of medicine has increased recently as more universities have adopted mobile technologies. 4 a sampling of current projects at academic mobile technologies & academics | dresselhaus and shrode 84 institutions is provided in the 2010 horizon report. 5 according to the 2010 educause center for applied research (ecar) study, 49 percent of undergraduates consider themselves mainstream adopters of technology. 6 locally, utah state university students have adopted smartphones at the rate of 39.3 percent and other handheld internet devices at the rate of 31.5 percent. these statistics indicate that skills are increasing and the technological landscape is changing quickly. the ecar study reports that student computing is rapidly moving to the cloud, another indication of the rapid change in the use of technology. “usb may one day go the way of the eight-track tape as laptops, netbooks, smartphones and other portable devices enable students to access their content from anywhere. they may or may not be aware of it, but many of today’s undergraduates are already cloud-savvy information consumers, and higher education is slowly but surely following their lead.” 7 similarly, usu students show interest in adopting new technology. while usu students are less likely to own mobile devices, 70.2 percent of respondents indicated that they would be likely or very likely to use library resources on smartphones if they owned capable devices and if the library provided easy access to materials. bridges, gascho rempel, and griggs published a comprehensive article, “making the case for a fully mobile library web site: from floor maps to the catalog,” detailing their efforts to implement mobile services on the oregon state university campus. 8 their paper highlights the popularity of mobile phones and smartphones/web-enabled phones. the authors discuss mobile phone use, library mobile websites, and mobile catalogs, and they describe the process they used to develop their mobile library site. they note that mobile services will certainly be expected in the coming years, and we have learned that usu students share this expectation. survey research in recent years librarians have conducted surveys on mobile technology in libraries. in a 2007 study, cummings, merrill, and borrelli surveyed library patrons to find out if they are likely to access the library catalog via small-screen devices. 9 they discovered that 45.2 percent of respondents, regardless of whether they owned a device, would access the library catalog on a small-screen device. mobile access to the library catalog was the most requested service in the usu student survey, although it accounted for only 16percent of the responses. cummings, et al. also discovered that the most frequent users of the catalog were also the least willing to access the catalog via mobile devices, an interesting observation that merits further research. their survey was completed in june of 2007, just five months after the january 9th release of the original iphone. the release of the iphone is significant as the point where the market demographics of mobile device users began to shift to people under thirty, the primary age group of undergraduate students. 10 librarians wilson and mccarthy at ryerson university conducted two surveys to measure information technology and libraries | june 2012 85 the usage of their catalog’s feature to send a call number via text or email (initiated in 2007) and their “fledgling mobile web site” (launched in 2008). 11 the first survey indicated that 20 percent of respondents owned internet-capable cell phones, and over half said they intended to buy this type of phone when their current contracts expired. the survey respondents indicated they wanted the following services: “booking group study rooms, checking hours and schedules, checking their borrower records and checking the catalogue.” 12 the second survey was conducted a year after the library had implemented a group study room reservation system, catalog and borrower record services, and a computer/laptop availability service. results of the follow-up survey show a drastic increase in ownership of internetcapable cell phones (from 20% to 65%). respondents desired two new services: article searches and e-book access. wilson and mccarthy found that very few library patrons were accessing the mobile services, but “60% of the survey respondents were unaware that the library provided mobile services.” 13 the authors conclude that advertising should be a central part of mobile technology implementation. they also detail how the library contributed expertise and leadership to their campus-wide mobile initiatives. seeholzer and salem conducted a series of focus groups in the spring of 2009 to determine the extent of mobile device use among students at kent state university. 14 notable among their findings are that students are willing to conduct research with mobile devices, and they desire to have a feature-rich interactive experience via handheld devices. students expressed interest in customizing interactions with the library’s mobile site and completing common tasks such as placing holds or renewing library materials. nationwide survey of librarians we asked colleagues who subscribe to e-mail distribution lists to respond to a survey about their libraries’ implementation of mobile applications for access to library collections and services. invitations to take the survey were sent to seven lists (acrl science & technology section, eril, information literacy instruction, liblicense-l, nasig, ref-l, and serialist), and 289 librarians and library staff members responded to the survey. the population of subscribers to the e-mail lists we used to solicit survey responses is dynamic and includes librarians and staff who work in academic and other types of settings. while our findings cannot be generalized in a statistically reliable manner, we nonetheless believe that the survey responses merit thorough analysis. we chose to conduct two surveys to avoid some of the problems we noted in a 2007 study conducted by todd spires. 15 spires’ survey questions focused on librarians’ perceptions rather than on empirical data. we developed separate surveys for librarians and students in hopes of avoiding problems that could arise from basing assumptions on perceived behavior or from the complexity of interpreting and generalizing from perceptions. a survey of library patrons should provide more accurate insight into the ways that patrons are using the library mobile technologies & academics | dresselhaus and shrode 86 via handheld devices. in the libraries that currently provide mobile access to resources, the library catalog is most commonly offered. article databases and assistance from a librarian tie as the second most frequently provided services. figure 1 shows a snapshot of the resources and services librarians reported that they provide. we also asked how long libraries have provided mobile access, and the time periods ranged from a few weeks to more than ten years. five librarians indicated that they have provided mobile access for six to ten years, and it is possible that these respondents may work in medical or health science libraries, as our literature review indicated that access to medical information and journal articles via pdas has been a reality for several years. figure 1. librarians’ responses: does your library provide mobile access to the following library resources? librarians were also asked what services and resources they believe libraries should provide via mobile devices. of one hundred seventy-eight responses, 71 percent indicated that “everything” or a variety of library resources should be made available. a few of the more interesting suggestions include a library café webcam (similar to a popular link from north carolina state university), locker reservations, a virtual suggestion box, alerts about database trials, an app that lists new books, and using ipads or other mobile devices for roving reference. roving reference with tablet pcs was evaluated by smith and pietraszewski at the west campus branch library of texas a&m. 16 as tablet computers become increasingly popular with the release of the ipad and other tablets, 17 roving reference should be reconsidered. smith and pietraszewski note that "the tablet pc proved to be an extremely useful device as well as a novelty that drew student interest (anything to make reference librarians look cool!)" 18 using the latest technology in libraries will help raise awareness that libraries are relevant and adapting to changing user preferences. we asked librarians to indicate who had responsibility for implementing mobile access in their library. the 184 responses are summarized here:  63 percent answered that a library systems or computing professional does this work;  26.1 percent indicated that the electronic resources librarian has this role;  17.9 percent rely on an information professional from outside of the library;  22.8 percent chose “other,” and we unfortunately did not offer a space for comments where survey respondents could tell us the job title of the person in their library who implements mobile access. the results from our sample of librarians are consistent with a larger study by the library journal. 19 the lj study found that the majority of academic libraries have implemented or are information technology and libraries | june 2012 87 planning to implement mobile technologies. student survey in january of 2011 we sent out a thirteen-question survey to students (questions are available in appendix a). usu’s student headcount is 25,767, and 3,074 students responded, representing 11.9 percent of the student population. we asked students to identify with colleges so that we could evaluate the survey sample against the enrollment at usu. the rate of response by college clustered between 12–19 percent with the lowest response rate (8 percent) from the college of education. the highest response rate came from the college of humanities and social sciences. we examined survey response rates from usu undergraduate and graduate populations; 54 percent of undergraduates and 50 percent of graduate students use mobile technology for academic purposes. we believe that our sample is sufficiently representative of the overall population of usu. figure 2. student response rates by college in order to understand the context of survey questions that specifically address mobile access, we asked students how often they used library electronic resources. the majority of students used electronic books, the library catalog, and electronic journals/articles a few times each semester. only 34.4 percent of students never use electronic books, 19.6 percent never use the library catalog, and 17.6 percent never use electronic journals/articles. we made comparisons between disciplines and found no significant difference in electronic resource use between fields in the sciences and those in humanities. further data will be collected in fall 2011 about use of print and electronic materials. mobile technologies & academics | dresselhaus and shrode 88 figure 3. electronic resource use among students students were asked how often they use a variety of handheld devices. we decided to emphasize access over ownership in order to allow for a variety of situations. responses show that 39.3 percent of our students use a smartphone with internet access on a daily basis. another 31.5 percent of students use other handheld devices like an ipod touch on a daily basis. very few students use ipads or e-book readers, with 3.9 percent and 5.4 percent indicating daily use, respectively. we view the "other handheld device" category as an important segment of the mobile technology market because of the lower cost barrier, since such devices do not require a subscription to a data plan. the ecar study also noted the possibility of cost factors influencing the decision of some students not to access the internet via a handheld device. 20 information technology and libraries | june 2012 89 figure 4. mobile device usage students were asked if they use their mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.). this question was intentionally worded broadly in order to gather general information. we used skip logic to direct respondents to different paths through the survey based on their response to earlier questions. in response to a question about how students use their mobile devices, 54 percent of respondents indicated that they use their mobile devices for academic purposes. we analyzed the results by discipline and noted a few variances. among students responding from the school of business, 63 percent said that they use their mobile device for academic purposes, and 59 percent of engineering students use their devices for school work. the respondents from the other colleges reported use under 50 percent, most likely because of more limited adoption of mobile technology by usu faculty in those fields or lack of personal funds (or unwillingness to spend) to acquire devices and data plans. the 2010 ecar report also noted higher exposure to technology in these fields, indicating that the situation at usu is in line with results from a national study. 21 mobile technologies & academics | dresselhaus and shrode 90 table 1. device use for academic purposes by college we asked the students, “if library resources were easily accessible on your mobile devices, and if you had such a device, how likely would you be to use any of the following for assignments or research?” responses to this question allowed us to gauge interest without concerns about cost of technology or the current state of mobile readiness in our library. among the survey respondents, 70.2 percent are likely or very likely to use resources on a smartphone; 46.9 percent are likely or very likely to use resources on an ipad; 45.9 percent are likely or very likely to use resources on an e-book reader; 63.2 percent are likely or very likely to use resources on other devices. we included an option for respondents to select “not applicable” as distinct from “not likely” to allow for those students who may welcome use of a mobile device but who may currently use a device different from the types we specified. information technology and libraries | june 2012 91 figure 5. likelihood of using library resources on mobile device if easily available we are unsure how to account for the dramatic difference in interest between smartphone and ipad usage. survey responses indicated that only a small number of students have access to an ipad, and it is possible that students have had little opportunity to see their classmates or others use ipads in an academic setting. students were asked in a free-text question to list the services the library should offer. the comments were varied and often used language different from the vocabulary that librarians typically use. in order to gain an understanding of trends and to standardize the language, we coded the survey comments. after coding, trends began to emerge. access to the library catalog was mentioned by 16 percent of respondents. mobile services in general were specified by 11 percent of survey respondents, 10 percent wanted articles, and 9 percent wanted to reserve study rooms on their mobile device. the phrase “mobile services” represents a catch-all tag designated for comments that indicated that a student desired a variety of services or all services that are possible. for example, only 9 percent of respondents indicated they had used text to contact the library and 15percent had used instant messaging. several students indicated they might have used these services but did not know they were available, indicating a need for advertising. while we learned much mobile technologies & academics | dresselhaus and shrode 92 about students’ desires for mobile services from this important subset of comments in response to the free-text question, they did not prove especially useful to guide librarians’ plans for the next stages of implementing mobile technology. figure 6. services requested by students as is common at many institutions, funding at usu is limited and any development in the area of mobile access implementation must be strategic. our survey indicated that usu students are using mobile devices for their academic work and would like to further integrate library resources into their mobile routine. the next section of this paper outlines the steps we are taking toward mobile implementation. going mobile the usu library joins many other academic libraries in the beginning stages of implementing mobile technologies. survey responses from students indicate that they use mobile devices for academic purposes, and until options to use the library with such devices are available and advertised, we will not have a clear understanding of students’ preferences. klatt's article, “going mobile: free and easy,” 22 outlines a way to get started with mobile services with small investments of time and money. articles by griggs, 23 back, 24 and west, 25 and books by green, et al. 26 and hanson 27 also provide guidance in this area. here we offer suggestions to establish an implementation team, conduct an environmental scan, outline steps to begin the process, and shed light on advertising, assessment, and policy issues. information technology and libraries | june 2012 93 implementation team for a library seeking to provide mobile access to online resources, a diverse and talented implementation team is important. public services personnel in an academic library staff are on the front lines and often field students’ questions. they may also have the opportunity to observe how students are using mobile devices in the library. if librarians track reference interactions, they may find evidence that students are attempting to use their mobile devices to access library services. the electronic resources/collections specialist will also play a key role in mobile development. these specialists are often in contact with vendors, and their advocacy is important in encouraging mobile web development in the vendor community. a web site coordinator interested in mobile services and knowledgeable in current web standards will bring essential talent to the team. arguably, a mobile-optimized web site should become a standard level of service. web sites that are optimized or adapted specifically for mobile access are device agnostic and do not require advanced knowledge of smart phone operating systems. therefore existing web development staff can apply their current skill set to expand into mobile web design. in order to launch advanced interactive access to library resources, a programmer who is interested in developing mobile apps on a number of platforms is needed. device-specific applications allow for the use of phone features such as gps and orientation sensing via an accelerometer and provide the basis for augmented reality technologies. environmental scan librarians can learn about mobile usage in their community by gathering information to guide future development. at usu we interpret the numbers of students who use mobile devices for academic purposes as justification for implementing mobile library access, but we have not set a benchmark for a degree of interest that would trigger more development. some of the mobile implementations described at the end of this paper required minimal time or were investigated because of the electronic resources librarian’s interest for their relevance to her role as music subject librarian. in the survey we administered to students, we considered it important to include a wide range of devices, including ipod touches and similar devices that have many of the same possibilities for academic use as smartphones but which do not require a monthly contract. laptops are also considered a mobile technology, and while we did not emphasize this class of devices, some student comments referred specifically to laptop computers. we will monitor use of the mobile applications that we implement and likely conduct a follow-up survey to assess students’ satisfaction and to find out if there are other services they would like for the library to provide. while librarians may gather useful information from a user study, there are other ways to determine if students are, in fact, using mobile devices in the library. one approach is to review logs of reference questions to determine if students are inquiring about access to library resources via mobile devices. recently, a few mobile-related questions have surfaced mobile technologies & academics | dresselhaus and shrode 94 at usu in the libstats program used to track reference interactions. this is also an area where training reference staff to recognize and record questions about mobile access could be helpful to detect demand in the library’s community. if vendors provide statistics about use of their products from mobile devices, this information could also contribute to assessing need. finally, in libraries that use vpn or other off-campus authentication methods, consulting with it support staff to see if they field questions on setting up remote access on smartphones or other devices may factor into decisions regarding mobile access. the usu information technology website provides a knowledgebase that includes entries on a variety of mobile device queries. this indicates to librarians that people in the university community are using their mobile devices for academic functions. before we conducted the survey of usu students, we knew little about the exact nature of their mobile use. getting started after identifying the needs on campus, the next step is to create a plan for mobile implementation. an important aspect of anticipating the needs of a library’s user population is to understand the likely use scenarios, goals, tasks, and context as outlined in “library/mobile: tips on designing and developing mobile web sites.” 28 building on services that incorporate tasks that people already perform in non-academic contexts provides a logical bridge for those who are familiar with everyday use of a mobile device to recognize how such devices can serve academic purposes. gathering information from each vendor that supplies content to the library is an important early step in planning. this information can serve as the basis of a mobile web implementation plan and, in the case of ebsco, creating a profile is necessary in order to allow access to a mobile-formatted platform. at usu our online catalog provider has developed an application for apple's ios platform. if a library’s catalog vendor does not offer a dedicated application or mobile site, samuel liston’s comparisons of three major online catalogs on three popular mobile devices is helpful in gaining an understanding of how opacs display on smartphones. his article also outlines a procedure for testing opacs and usability. 29 at usu we can also take advantage of serials solutions’ mobile-optimized search screen and a variety of applications provided by other vendors. jensen noted that librarians should not rely solely on vendor-created applications due to vendors’ tendency to develop applications that are usable by only a segment of the overall mobile device user population. 30 he adds that libraries should also avoid developing applications for limited platforms. in addition, jensen provides a simple step-by-step process for converting articles retrieved from a vendor database to a format that can be downloaded from electronic course reserves and read on a variety of handheld devices. while using vendor-developed applications is an important strategy, most libraries will find that developing a mobile-compatible library website is necessary. information technology and libraries | june 2012 95 mobile website development can be accomplished in a variety of ways. at usu we plan to offer a version of our regular website by employing cascading style sheets (css). this method is described in the paper by bridges, et al., 31 and standard guidelines can be found in the mobile web best practices 1.0. 32 this method will allow the content to be reformatted at the point of need for a variety of platforms. results from the usu student survey indicate a desire to be able to use a mobile device for access to the library catalog, to use services like reference assistance, find articles, and make study room reservations. the library plans to include hours and location information, access to existing reference chat and text features, and links to databases with mobile friendly websites or vendor-created applications in addition to the resources requested by students. we are still unsure of the best way to provide links to applications and how to explain the various authentication methods required by each vendor. while vpn and ezproxy are possible methods to authenticate via mobile devices, vendors are content at the moment to allow students to access their resources by setting up an account that is based on an authorized e-mail domain or through a user account created on the non-mobile version of the resource. in a few cases at usu, mobile applications from vendors allow access to categories of users such as alumni because they have a usu.edu e-mail address, although the library does not typically include these patrons in our authorized remote user group. advertising, assessment, and policy creating a mobile website and offering mobile services are only the beginning of the effort to provide access to library materials for mobile users. as wilson and mccarthy found, advertising is essential; 33 students won’t use a service they don’t know about. crafting a marketing plan with both online and print materials is essential. educating library staff members, especially those on the public services front line, is an essential part of promoting mobile services. assessment strategies must be developed in order to focus development strategically. periodic surveys and focus groups can inform future development of mobile services and gauge the impact of currently offered services. librarians should encourage vendors to provide usage data for their mobile portals or applications, and libraries can track use data from their own information technology departments. implementation of mobile web services creates the need to develop new policies and to educate staff. privacy concerns and the complexities of digital rights management have the potential to transform the role of the library and its policies. 34 patrons will need to be aware that the library has less control over maintaining privacy when materials are accessed via third-party mobile applications. libraries will need to consider how new developments in pricing models may affect expanding mobile access; one example is harpercollins’ announcement in early 2011 about a policy requiring libraries to repurchase individual e mobile technologies & academics | dresselhaus and shrode 96 book titles after a cap on check-outs is reached. 35 librarians’ desire to offer reference services or other assistance via mobile devices follows naturally from their long-standing efforts to enable patrons to ask questions via e-mail, chat, instant messaging, or sms text. instant messaging, chat, and text lend themselves to mobile access because they are designed for the relatively short exchange that people typically use when communicating with a handheld device. offering reference services using sms text and chat in particular are relatively easy for libraries to employ because there are many free services to support them. in some cases, a systems administrator or it expert may be helpful in navigating the set-up of chat and text services and to integrate them so that, for example, when a text message arrives during a time when no one is monitoring the service, a voicemail message automatically appears in library’s e-mail account. librarians can find an enormous amount of advice on the web and in the literature about how to begin offering mobilefriendly reference, how to expand the virtual reference services they currently provide, and how to choose among free and fee-based services for their library’s needs and budget. two efficient places to begin are cody hanson’s special issue of library technology reports, which provides a thorough overview of mobile devices and their capabilities and straightforward suggestions for planning and implementation, and m-libraries, a section of library success: a best practices wiki. 36 conclusion in light of trends toward more widespread use of mobile computing devices and smartphones, it makes sense for libraries to provide access to their collections and services in ways that work well with mobile devices. this case study presents the situation at the merrill-cazier library at utah state university, where students who responded to a survey indicate they are very interested in mobile access, even if they have not yet purchased a smartphone or find data plans to be too expensive at this point. as is only reasonable for any library, at usu we have begun by implementing mobile applications that are available from vendors of our online catalog and databases because these require minimal effort and no additional cost. we present ideas for establishing an implementation team and advice for academic libraries who wish to “go mobile.” we aim to have a concrete plan for the work that will be required to optimize the library’s website for mobile access by the fall of 2011. a significant step is hiring a digital services librarian to work closely with the webmaster, electronic resources librarian, and others interested in promoting access to resources and services via mobile devices. our vision is to be on track to offer an augmented-reality experience to our patrons as the 2010 horizon report indicates will be an important trend in the next two to three years. we aim to create an environment in which students can use their mobile device to gain entry to a new layer of digital information, enhancing their experience in the physical library. information technology and libraries | june 2012 97 references 1. clifton dale foster, “pdas and the library without a roof,” journal of computing in higher education 7, no. 1 (1995): 85–93. 2. russell smith, “adapting a new technology to the academic medical library: personal digital assistants,” journal of the medical library association 90, no. 1 (2002): 93–94; colleen cuddy, using pdas in libraries: a how-to-do-it manual (new york: neal-schuman publishers, 2005). 3. andrea jackson, “wireless technology poised to transform health care,” rady business journal 3, no. 1 (2010): 24–26. 4. alan w. aldrich, “universities and libraries move to the mobile web,” educause quarterly 33, no. 2 (2010), www.educause.edu/educause+quarterly/educausequarterlymagazinevolum/univers itiesandlibrariesmoveto/206531 (accessed mar. 30, 2011). 5. larry johnson, alan levine, r. smith, and s. stone, the 2010 horizon report (austin, tx: the new media consortium, 2010), www.nmc.org/pdf/2010-horizon-report.pdf (accessed mar. 31, 2011). 6. shannon d. smith and judith borreson caruso, with an introduction by joshua kim, the ecar study of undergraduate students and information technology, 2010 (research study, vol. 6) (boulder, co: educause center for applied research, 2010), www.educause.edu/ecar (accessed mar. 31, 2011). 7. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 8. laurie bridges et al., “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 309–20. 9. joel cummings, alex merrill, and steve borrelli, “the use of handheld mobile devices: their impact and implications for library services,” library hi tech 28, no. 1 (2009): 22– 40. 10. rubicon consulting, the apple iphone: success and challenges for the mobile industry (los gatos, ca: rubicon consulting, 2008), http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf (accessed mar. 31, 2011). 11. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 215. http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.educause.edu/ecar http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf mobile technologies & academics | dresselhaus and shrode 98 12. ibid., 216. 13. ibid., 223. 14. jamie seeholzer and joseph a. salem, “library on the go: a focus group study of the mobile web and the academic library,” college and research libraries 72, no. 1 (2011): 9– 20. 15. todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287– 309. 16. michael m. smith and barbara a. pietraszewski, “enabling the roving reference librarian: wireless access with tablet pcs,” reference services review 32, no. 3 (2004): 249–55. 17. kathryn zickuhr, generations and their gadgets (washington, d.c.: pew internet & american life project, 2011), http://pewinternet.org/reports/2011/generations-andgadgets.aspx (accessed mar. 31, 2011). 18. smith and pietraszewski, “enabling the roving reference librarian,” 253. 19. lisa carlucci thomas, “gone mobile: mobile catalogs, sms reference, and qr codes are on the rise—how are libraries adapting to mobile culture?” library journal 135, no. 17 (2020): 30–34. 20. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 21. ibid. 22. carolyn klatt, “going mobile: free and easy,” medical reference services quarterly 30, no. 1 (2011): 56–73. 23. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile web sites,” code4lib 8, november 23, 2009, http://journal.code4lib.org/articles/2055 (accessed mar. 30, 2011). 24. godmar back and a. bailey, “web services and widgets for library information systems,”information technology & libraries 29, no. 2 (2010): 76–86. 25. mark andy west, arthur w hafner, and bradley d. faust, “communications—expanding access to library collections and services using small-screen devices,” information technology & libraries 25, no. 2 (2006): 103. 26. courtney greene, missy roser, and elizabeth ruane, the anywhere library: a primer for the mobile web (chicago: association of college and research libraries, 2010). http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://journal.code4lib.org/articles/2055 information technology and libraries | june 2012 99 27. cody w. hanson, “libraries and the mobile web,” library technology reports 42, no. 2 (february/march 2011). 28. griggs, bridges, and gascho rempel, “library/mobile.” 29. samuel liston, “opacs and the mobile,” computers in libraries 29, no. 5 (2009): 6–47. 30. r. bruce jensen, “optimizing library content for mobile phones,” library hi tech news 27, no. 2 (2010): 6–9. 31. griggs, bridges, and gascho rempel, “library/mobile.” 32. “mobile web best practices 1.0,” worldwide web consortium (w3c), www.w3.org/tr/mobile-bp (accessed mar. 30, 2011). 33. wilson and mccarthy, “the mobile university.” 34. timothy vollmer, there’s an app for that! libraries and mobile technology: an introduction to public policy considerations (policy brief no. 3) (washington, d.c.: ala office for information technology policy, 2010), www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf (accessed mar. 31, 2011). 35. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal, february 25, 2011, www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed mar. 31, 2011). 36. “m-libraries: library success: a best practices wiki,” www.libsuccess.org/index.php?title=m-libraries, (accessed mar. 31, 2011). file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libsuccess.org/index.php%3ftitle=m-libraries, mobile technologies & academics | dresselhaus and shrode 100 appendix a. student survey questions 1. type of student? 2. age? 3. gender? 4. what is your college? 5. how often do you use the following electronic resources provided by your library? 6. do you use any of the following devices? 7. do you use your mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.)? 8. please list what you use your device to do? 9. have you ever used a text message to get help using the library? 10. have you ever used instant messaging to get help using the library? 11. if library resources were easily accessible on your mobile devices and if you had such a device, how likely would you be to use any of the following for assignments or research? 12. what mobile services would you like the library to offer? 13. comments? information technology and libraries | june 2012 101 appendix b. librarian survey questions 1. type of library? 2. your job/role in the library? 3. years working in libraries? 4. does your library offer mobile device applications for the following electronic resources? 5. who in your library or on your campus is responsible for implementing or developing mobile device applications? 6. how long has your library provided access via mobile devices to electronic resources or services? 7. if you collect use data for library electronic resources, are patrons using the mobile device applications your library provides? 8. what mobile services do you believe libraries should offer? 9. comments? editorial | marmion 167 dan marmioneditorial: why is ital important? editor’s note: what follows is a reprint of dan marmion’s editorial from ital 20, no. 2 (2001), http://www.ala.org/ ala/mgrps/divs/lita/ital/2002editorial.cfm. after reading, we ask you to consider: why does ital matter to you? post your thoughts on italica (http://ital-ica.blogspot .com/). s ome time ago i received an e-mail from a library school student, who asked me “why is [ital] important in the library profession?” i answered the question in this way: ital is important to the library profession for at least four reasons. first, while it is no longer the only publication that addresses the use of technology in the library profession, it is the oldest (dating back to 1968, when it was founded as the journal of library automation) and, we like to think, most distinguished. second, not only do we publish on a myriad of topics that are pertinent to technology in libraries, we publish at least three kinds of articles on those subjects: pure scholarly articles that give the results of empirical research done on topics of importance to the profession, communications from practitioners in the field that present real-world experiences from which other librarians can profit, and tutorials on specific subjects that teach our readers how to do useful things that will help them in their everyday jobs. the book and software reviews that are in most issues are added bonuses. third, it is the “official” publication of lita, the only professional organization devoted to the use of information technology in the library profession. fourth, it is a scholarly, peer-reviewed journal, and as such is an important avenue for many academic librarians whose career advancement depends in part on their ability to publish in this type of journal. in a sentence, then, ital is important to the library profession because it contributes to the growth of the profession and its professionals. after sending my response, i thought it would be interesting to see what some other people with close associations to the journal would add. thus i posed the same question to the editorial board and to the person who preceded me as editor. here are some of their comments: one of the many things that was not traditionally taught in library school was a systematic approach to problem solving—for somebody who needs to acquire this skill and doesn’t have a mentor handy, ital is a wonderful resource. over and over again, ital describes how a problem was identified and defined, explains the techniques used to investigate it, and details the conclusions that might fairly be drawn from the results of the investigation. few other journals so effectively model this approach. regardless of the specific subject of the article, the opportunity to see practical problem solving techniques demonstrated is always valuable. (joan frye williams) the one thing i would add to your points, and it ties into a couple of them, is that by some definitions a “profession” is one that does have a major publication. as such, it is not only the “official” publication of lita but an identity focus for those professionals in this particular area of librarianship. in fact, ideally, i would like to think that’s more of a reason why ital is important than just the fact that it’s a perk of lita membership. (jim kopp) real world experiences from which other librarians would profit—to use your own words. that is my primary reason for reading it, although i take note of tutorials as well. and the occasional book review here may catch my eye as it is likely more detailed that what might appear in lj or booklist, and [i would] be more likely to purchase it for either my office or for the general collection. (donna cranmer) ital begins as the oldest and best-established journal for refereed scholarly work in library automation and information technology, a role that by itself is important to libraries and the library profession. ital goes beyond that role to add high-quality work that does not fit in the refereed-paper mold, helping librarians to work more effectively. as the official publication of america’s largest professional association for library and information technology, ital assures a broad audience for important work—and, thanks to its costrecovery subscription pricing, ital makes that work available to nonmembers at prices far below the norm for scholarly publishing. (walt crawford) the journal serves as an historical record/documentation and joins its place with many other items that together record the history of mankind. a professional/scholarly journal has a presumed life that lasts indefinitely. (ken bierman) in a sentence, ital is important to the profession because “communication is the key to our success.” dan marmion was editor of ital, 1999–2004. this editorial was first published in the june 2002 issue of ital. 168 information technology and libraries | december 2010 to paper. ital provides one means of fostering this communication in a format that is easily usable and recognizable. it is not the only communications format, but it fills a particular niche. (eric lease morgan) so there you have the thoughts of the editor and a few other folks as to why this journal is important. * * * why does ital matter to you? post your thoughts on italica (http://ital-ica.blogspot.com/). ital is a formal, traditional, and standardized way of sharing ideas within a specific segment of the library community. librarianship is an institutional profession. as an institution it is an organic organization requiring communication between its members. an advantage of written communication, especially paper-based written communication, is its ability to transcend space and time. a written document can communicate an idea long after the author has died and half way around the world. yes, electronic communication can do the same thing, but electronic communication is much more fragile than ideas committed 149 an integrated computer based technical processing system in a small college library jack w. scott: kent state university library, kent, ohio (formerly lorain county community college, lorain, ohio) a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order. records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. the lorain county community college, a two-year institution designed for 4000 students, opened in september 1964, with no librarian and no library collection. when the librarian was hired in october 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. no library having been established, there were no inflexible rules governing acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. the librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remembering, at least most of the time, the primary reasons for designing the 150 journal of library automation vol. 1/3 september, 1968 system. these were 1) to notify the vendor about what material was desired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of encumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the librarian. the librarian attended the purdue conference on library automation (october 2-3, 1964) and an ibm conference on a-utomation held in cleveland (december 1964), and visited libraries with data processing installations, such as the decatur public library. then an extensive literature search was run on the subject of mechanization of libraries and the available material thoroughly reviewed. it was the consensus of the president, the librarian, and the manager of data processing that, as white said later, "the computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. metro}) after careful consideration of the ibm 870 document writing system ( 2) and the system described by kraft ( 3) as input techniqu~s for the college library, ·it . was decided to use the friden flexowriter, recommended both at purdue and, in european applications, by bernstein ( 4). its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. the college, by mid-1965, ·had the following equipment available for library use: one friden flexowriter (model 2201) with card punch control unit and tab card reading unit, one ibm 026 key punch with alternate programming, and guaranteed time on the college-owned ibm 1440 8k computer with two tape and lwo disc drives. to produce punched paper tape and tab cards with only one keyboarding, an electrical connection between the flexowtiter and the keypunch was especially designed and installed. . it was fortunate for the library that the college also had an excellent data processing· manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. with his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of library processes was completely successful. ·· at this time it ·was decided that since the college was likely to remain integrated computer based processing/ scott 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most attractive to librarian and faculty alike. therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. these two files were to be supplemented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. these book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. it was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the library's automated system should be withdrawn. concern about such withdrawal of support had originally been voiced by durkin and white in 1961, when they said: "there have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandoning their reliable, if old-fashioned, 3x5 card catalogs." ( 5) although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. all other records are generated from these main entries. this integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. experience showed that about 95 percent of acquisitions did have catalog copy readily available. of 4029 titles processed in a 5-month period, catalog copy was available for 3824. after verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the national union catalog, library of congress proofsheets, or publisher's weekly, etc. the catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the friden flexowriter. output from the friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave ibm key punch, which contain full order information but only abbreviated bibliographic data. (figure 1 ). the tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of library automation vol. 1/ 3 september, 1968 mail copies to vendor typed multiple book orders on order tape fig. 1 on order creation routine. start flexowriter 026 key punch on order cards cards to week integrated computer based p1'0cessing / scott 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (figure 2). at any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (figure 3 ). the punched paper tape and two copies of the request form are stored in a standard 3x5 card file arranged by main entry. one copy of the request form is to be used as a work slip when material is received. on order cards for one week fig. 2 on order update. start cpu on order update scratc h a f ter update 154 journal of library automation vol. 1/ 3 september, 1968 the original and one copy of the request form is sent to the vendor, with instructions to return one copy with shipment. in the event the vendor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. if the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the library with a notice stating that shipment is overdue. these tab cards are used as input fig. 3 on order cost tally. start cpu list or tab of on order file by cost #30000 on order cost tab integrated computer based processing/ scott 155 to the flexowriter tab card reader unit which activates the flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (figfig. 4 late on order routine. ure 4). 156 journal of library automation vol. 1/ 3 september, 1968 products when material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cataloger who notes on the work slip the call number to be used as well as any changes. the work slip, punched paper tape and book then pass to the technician who does the shelf listing. at this point the original output paper tape containing full bibliographic information is used as input for the flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. the last three items and the call number are added manually as "changes." simultaneously a new paper tape is produced as output which contains bibliographic information from the first tape and all revisions deemed necessary by the cataloger. the revised paper tape is used on the flexowriter to prepare 3x5 card sets for the public catalog. at the same time the slave keypunch prepares a set of tab cards containing full acquisitions fig. 5 shelf list creation routine. integrated computer based processing/scott 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (figure 5). the tab cards are used first to delete the item from the magnetic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (figure 6). the magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) printouts of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. the tab cards can be used with the ibm 357 circulation system or any similar system. discussion the efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. one technician can sort by call number cpu circ. caro prep fig. 6 weekly shelf list update. sort by control number cpu 158 journal of library automation vol. 1/ 3 september, 1968 process over one thousand orders per month. over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which consists solely of one full-time cataloger and two full-time technicians. one technician spends one half of her time typing orders and the other half preparing the shelf list. at present the limiting factor in processing material is not the personnel time available but rather time on the flexowriterkeypunch combination, which runs continuously for sixty hours per week. the cataloger feels if some thirty hours more per week were available for running the machines, or if a second flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. references 1. white, herbert s.: "to the barricades! the computers are coming!" special libmries 57 (november, 1966), 631. 2. general information manual: mechanized library procedures (white plains, n.y.: ibm, n.d.). 3. kraft, donald h .: libmry automation with data processing equipment (chicago: ibm, 1964). 4. bernstein, hans h.: "die verwendung von flexowritern in dokumentation und bibliothek", n achrichten fur dokumentation 12 (june, 1961), 92. 5. durkin, robert e.; white, herbert s.: "simultaneous preparation of library catalogs for manual and machine applications", special libraries 52 (may, 1961), 231. 6. kaiser, walter h.: "new face and place for the catalog card", library journal 88 (january, 1963 ), 186. from chatgpt to catgpt: the implications of artificial intelligence on library cataloging article from chatgpt to catgpt the implications of artificial intelligence on library cataloging richard brzustowicz information technology and libraries | september 2023 https://doi.org/10.5860/ital.v42i3.16295 richard brzustowicz (rrbrzustowicz@carlow.edu) is instruction and outreach librarian, carlow university. © 2023. abstract this paper explores the potential of language models such as chatgpt to transform library cataloging. through experiments with chatgpt, the author demonstrates its ability to generate accurate marc records using rda and other standards such as the dublin core metadata element set. these results demonstrate the potential of chatgpt as a tool for streamlining the record creation process and improving efficiency in library settings. the use of ai-generated records, however, also raises important questions related to intellectual property rights and bias. the paper reviews recent studies on ai in libraries and concludes that further research and development of this innovative technology is necessary to ensure its responsible implementation in the field of library cataloging. introduction as librarianship continues to evolve in the digital age, the importance of cataloging as a tool for accessing vast amounts of information cannot be overstated. unfortunately, this crucial process can be both labor-intensive and time-consuming, often requiring significant resources. in recent years, automation and artificial intelligence (ai) technologies have emerged as potential solutions for streamlining workflows. openai’s language model chatgpt1 is one such technology, offering the potential to automate various tasks, including text generation and even creating working code.2 this paper explores the potential applications of chatgpt in library cataloging, examining the results of my own experiments using this innovative technology. literature review large language models (llms) applications have been explored in a range of contexts. taecharungroj explores reactions to chatgpt, noting wide public interest expressed via social media posts. due to chatgpt’s ability to generate accurate information in a conversational tone, it provides an accessible medium for interacting with an ai resource. these technologies will usher in substantial changes to how we do our work: “the next evolution of jobs will likely be impacted by chatgpt and other innovative ai technologies.”3 at the same time, the author cautions that while chatgpt performs admirably in generating semantically and syntactically correct information, it does not always provide accurate information. therefore, this presents an opening for targeted professional development opportunities in fields that may be affected by llms. kasneci et al. argue that ai and language models, if used effectively, are an opportunity for better, more learner-centered education. they caution, however, that llms’ potential for bias may necessitate careful training and review of records by professionals: “[i]f a model is trained on data that is biased towards certain groups of people, it may produce results that are unfair or discriminatory towards those groups (e.g., local knowledge about minorities such as small ethnic groups or cultures can fade into the background).”4 to counter the potential for bias and abuse, the authors advise that individuals using these resources should ensure that the data training the mailto:rrbrzustowicz@carlow.edu information technology and libraries september 2023 from chatgpt to catgpt 2 brzustowicz ai systems are diverse and inclusive. to that end, they recommend making the data open and available to users. likewise, the authors recommend regular reviews and continuous updates of ai models. this will ensure not only that the ai (e.g., chatgpt) generates accurate information, but it will also provide a model for continuous improvement of the machine learning algorithm.5 recent research has examined the potential of chatgpt in the library context. lund and wang discuss the implications of chatgpt for libraries and the academy. according to their findings, chatgpt can enhance library search systems’ accuracy and effectiveness by automatically generating cataloging and metadata for library resources, such as book titles and descriptions.6 however, they also note the potential ethical and legal implications of using chatgpt. as chatgpt draws on publicly available material, it may include copyrighted content, necessitating careful evaluation of ai-generated content and coordination with the copyright holder. according to atlas, the initial impact of ai in higher education is still not fully understood, but it does present an opportunity for professionals to incorporate chatgpt into their workflow to increase efficiency and streamline tasks. atlas advises that a well-thought-out plan with clear goals in mind is crucial when utilizing chatgpt.7 these findings suggest that ai resources such as chatgpt have the potential to improve library cataloging by automating labor-intensive and timeconsuming tasks. moreover, they offer opportunities for professional development resources to ensure that libraries remain up to date with current technologies. methodology to evaluate chatgpt’s potential for library cataloging, i used the chatgpt free research preview version (released february 13, 2023) to generate marc records in rda for various items. when asked about its training data, chatgpt replied: i was trained on a vast amount of data, including examples of marc and rda records, as well as the rules and guidelines for creating these records. my training data includes a wide variety of cataloging resources, such as online guides and manuals, training modules, and real-world examples of marc and rda records. this training allows me to generate accurate and standardized marc and rda records based on the information provided (chatgpt, personal communication, february 23, 2023). while this response was interesting, i wanted to test chatgpt’s claims. i asked it to generate records for six items, including one with no corresponding worldcat entry, and compared the results to records created by human catalogers. five items had existing marc records, while the sixth item was specifically chosen to test chatgpt’s ability to generate an original record. i evaluated chatgpt’s accuracy, efficiency, and ability to handle different types of materials and cataloging/metadata standards, given that chatgpt is known to produce both accurate and inaccurate or misleading information.8 to test chatgpt’s ability to create marc records, i conducted an initial experiment using readily available materials. i asked chatgpt to generate a marc record for the 1996 edition of anne rice’s interview with the vampire9 using rda (chatgpt, personal communication, february 23, 2023). the resulting record is shown in table 1, and i compared it to a record in oclc’s worldcat, which is illustrated in table 2. the results of this test indicate that chatgpt can produce an accurate and effective record for interview with the vampire. information technology and libraries september 2023 from chatgpt to catgpt 3 brzustowicz after this first success, i attempted to generate a marc record for the 2018 vinyl reissue of david bowie’s 1977 album low10 using chatgpt and the rda standard (chatgpt, personal communication, february 23, 2023). the resulting marc record is presented in table 3, which was then compared to professional catalogers’ records. table 4 shows an existing marc record for low in oclc’s worldcat. notable differences were observed between the human-generated and chatgpt-generated marc records, with the chatgpt record lacking foreign-language headings and subject headings in certain fields (6xx). this is not surprising, as such tasks require a degree of personal discernment on the part of the cataloger. these discrepancies spurred me to investigate the applications further. i refined the question to test chatgpt’s ability to generate appropriate library of congress call numbers. for this example, i requested: “generate a marc record using rda that includes library of congress call number for the 1971 german edition of pedagogy of the oppressed”11 (chatgpt, personal communication, february 24, 2023). tables 5 and 6 demonstrate that while chatgpt may not always “choose” the same subject access points or consistently format all relevant fields as effectively as a human cataloger, given proper training and oversight it can be used as an effective supplement to human cataloging. the accurate formatting of field 050 and appropriate “dummy” call number (lb875.p442) further demonstrate this technology’s potential for streamlining cataloging and resource description, given proper training. in this instance, the ai noted that multiple fields would need to be edited: “please note that the control number (001) and the date (005) in the above record are placeholders and should be replaced with actual values when creating the record” (chatgpt, personal communication, february 24, 2023). to further put chatgpt’s abilities to the test, i asked it to generate a citation for the 2018 russian print edition of cixin liu’s the three body problem12 (chatgpt, personal communication, march 2, 2023). this was a more complex request than the previous ones; it required chatgpt to extract and incorporate metadata from a non-latin character set (cyrillic) and in a foreign language. table 7 shows the marc record generated by chatgpt, while table 8 displays the existing marc record for the russian translation of this work found in worldcat. although there were differences between the two records, chatgpt’s output was comparable to the professional catalogers’ work. the discrepancies between the records, however, suggested that chatgpt was not merely reproducing existing records but creating original marc records, as it claimed. the results of this test further demonstrate chatgpt’s potential as a powerful tool for automating the generation of accurate metadata records. during my testing, i discovered that the limited vinyl pressing of alternative rock band mood rings’ 2013 single “pathos y lagrimas”13 had no worldcat entry. to see if chatgpt could generate an original marc record for this item, i asked it, “can you generate a marc record using rda for mood rings’ 2013 single ‘pathos y lagrimas’” (chatgpt, personal communication, march 8, 2023). despite the absence of an equivalent worldcat record, chatgpt was able to provide a sample marc record, which i have included in table 9. this record, complete with sample text for the leader and control fields (00x), serves as evidence of two important capabilities of chatgpt: its ability to generate original cataloging records, and its incorporation of placeholder content in fields that are collection specific. chatgpt’s ability to generate accurate marc records using both rda and ersatz “original” cataloging demonstrates its potential as a cataloging and item description resource. additionally, chatgpt’s versatility is further highlighted by its ability to produce original content in other metadata formats. when asked if it could generate records using the dublin core metadata information technology and libraries september 2023 from chatgpt to catgpt 4 brzustowicz element set, chatgpt not only confirmed its ability but also provided a sample entry for “pathos y lagrimas” as seen in table 10. while some modifications may be necessary to cater to collectionspecific demands, this showcases chatgpt’s potential as a time-saving tool for automating record generation in multiple formats. in addition to its ability to generate accurate records adhering to multiple metadata standards, the results of this study also highlight the potential versatility of chatgpt as a cataloging and item description resource. the model’s ability to generate records for different media and in different languages could prove particularly useful for librarians and other information professionals who manage diverse collections. moreover, while catalogers may need to modify the pregenerated records to suit their specific collections’ requirements, chatgpt’s user-friendly interface and accurate record generation suggest that it could be a valuable tool for improving cataloging workflows and increasing efficiency. with further development and refinement, chatgpt has the potential to significantly enhance the capabilities of information professionals and improve the discoverability of library collections. results this study provides evidence that chatgpt can generate accurate records that conform to multiple metadata standards. the model can extract essential metadata, including title, author, publisher, publication date, subject headings, and other descriptive elements, with precision. additionally, my research reveals that chatgpt’s ability to generate marc records is not limited to specific formats or languages, as it successfully created marc records for various media and materials in different languages, such as english, german, and russian. chatgpt was able to generate both accurate existing authority records and entirely original ones, and it could generate records using both rda and dublin core standards. according to chatgpt, it has been trained on data from various catalogs, including oclc’s worldcat, the library of congress, the national library of medicine, the british library, copac (uk academic and national library catalog), europeana, and the hathitrust digital library (chatgpt, personal communication, march 9, 2023). this poses a unique challenge, as these catalogs may have different policies on access and reuse of their data. for example, oclc’s catexpress is a subscription-based automated cataloging system. if chatgpt or a future “catgpt” draws on oclc’s data and makes it available for free, it may raise questions about oclc’s copyright holdings. additionally, while chatgpt may generate records for materials available on the public internet, such as “pathos y lagrimas,” questions remain regarding how to credit the intellectual labor necessary for creating these records. my comparison of chatgpt-generated marc records against manually created records by professional catalogers had positive results. while the accuracy of the chatgpt-generated records was comparable to those of the manually created records, notable differences existed in how subject access points were assigned. this suggests that chatgpt has the potential to provide new methods for growing the discipline of library cataloging by automating the more rote, laborintensive and time-consuming tasks (for example, copy cataloging). in future studies, it may be of interest to the discipline to further test the applications of ai-generated marc records on a catalog-wide scale. while chatgpt has the potential to streamline aspects of the cataloging process, it is not a complete replacement for human catalogers. the records generated by chatgpt can serve as effective starting points, but they often contain discrepancies when compared to professional information technology and libraries september 2023 from chatgpt to catgpt 5 brzustowicz catalogers’ records. for example, while the placeholder text in fields 001 and 005 can be useful, it may not match the formatting standards used by specific library collections. nonetheless, chatgpt-generated records can be accurate and effective in classifying information that is not specific to any collection, such as call numbers. bias while chatgpt shows promise as a tool for generating marc and dublin core-style records, it is also limited by its training data. at present, chatgpt searches public records (e.g., worldcat). as a result, any records it generates will draw on existing professional catalogers’ records. if a record is incomplete or contains bias—even via omission—then chatgpt will reflect those biases in its output. this will necessitate close monitoring of both original records and those which chatgpt has created through virtual copy cataloging. chatgpt’s ability to copy and generate records is rooted in its machine learning-based understanding of cataloging and metadata standards. this ai system uses training data from oclc’s worldcat to generate records, which means that the quality of the generated records is dependent on the quality and comprehensiveness of the training data.14 biases or limitations in the training data can result in biased or incomplete records. for example, if the training data is restricted to certain regions, languages, or publishers, the generated records may not reflect the full diversity of a library’s collections. similarly, biases in subject headings, descriptors, or other fields in the training data may also manifest in the generated records. while chatgpt itself has no biases, it is possible for biases to be introduced through the training data, which makes it essential for librarians and other information professionals to curate and update the data regularly. to address these potential biases, information professionals training a large language model should curate the training data carefully and periodically review and update it to ensure it is comprehensive, representative, and unbiased. they may also need to manually review and edit generated records to correct any biases or inaccuracies identified. this approach would provide new opportunities for the profession to highlight diversity, equity, and inclusion in the development and use of ai. while an ai may not have biases, biases of the people involved in training and applying the ai could affect the generated content. like other machine learning models, chatgpt acquires its biases from external sources as it can only respond to the data it has been trained on, which may reflect human errors or intentions. therefore, while chatgpt could streamline and improve the record generation process, information professionals should approach its use with awareness of its limitations and potential biases. to ensure the accuracy, comprehensiveness, and fairness of the generated records, information professionals should take proactive measures to mitigate any biases and errors . discussion the results of this study have significant implications for library cataloging. the ability to accurately create descriptive records using chatgpt could significantly reduce the time and resources required for copy cataloging; this could free up library workers to focus on other important tasks, such as collection development, user services, and metadata management. moreover, chatgpt could improve the accuracy and consistency of records in library catalogs. as chatgpt follows established cataloging rules, records created by the model are less likely to contain errors or inconsistencies; this could lead to improved search and discovery experiences for library users, as well as better interoperability between library catalogs and other systems. information technology and libraries september 2023 from chatgpt to catgpt 6 brzustowicz the intellectual property concerns surrounding chatgpt’s ability to generate content are multifaceted. one concern is the potential for copyright infringement, as chatgpt’s detailed descriptions of original works may be too like the originals, leading to legal issues for those who use the generated content without proper attribution or permission. this concern is particularly heightened for copyrighted works like books or music, where even small portions of the work can be protected. therefore, it is crucial for chatgpt’s output to be thoroughly reviewed and vetted before being used in any public-facing materials. another concern is the possibility of misattribution of authorship. chatgpt’s use of dublin core to describe original works could lead to disputes over ownership and potentially even legal action if it generates a description that attributes authorship to the wrong person or entity. to prevent such conflicts, information professionals should ensure that the metadata generated by chatgpt accurately reflects the authorship and ownership of the original work. this can be done by reviewing and editing chatgpt’s output to ensure that the metadata is correct before it is shared publicly. the ownership of the generated content is also a concern, as it is not clear who owns the content created by chatgpt. as a machine learning model, chatgpt generates content based on the data it has been trained on, raising questions about the ownership of the content it produces. establishing clear guidelines for ownership and use of the generated content can help avoid any potential disputes over ownership and ensure that appropriate attribution and permissions are obtained; this is particularly important given the potential commercial value of the content that chatgpt can produce. furthermore, it is essential to consider ethical and legal implications of the generated content, such as data privacy and protection, and to ensure that these concerns are addressed when designing guidelines for ownership and use. finally, there is the potential for unintentional disclosure of sensitive or confidential information. chatgpt’s ability to generate detailed descriptions of original works may inadvertently disclose unpublished findings or proprietary information, potentially causing harm to the author or institution. to mitigate this risk chatgpt’s output must be carefully reviewed and edited to ensure that it does not inadvertently disclose sensitive information. implementing appropriate data security measures and access controls may help prevent unauthorized access to sensitive information. conclusion the study demonstrates that chatgpt has the potential to significantly streamline the cataloging process in libraries by generating accurate and consistent records for a diverse range of materials. however, it should be used as an auxiliary tool in conjunction with human cataloging efforts to ensure the highest level of accuracy and impartiality. regular monitoring and evaluation of the model are necessary to detect any potential biases or limitations in the training data. by applying a careful and considered approach to its use, librarians and other information professionals can leverage chatgpt to enhance the efficiency and effectiveness of cataloging processes, ultimately benefiting library and information center patrons. the accurate and comprehensive marc records produced by chatgpt highlight its potential to enhance the effectiveness of library cataloging systems. by extracting metadata information such as author, publisher, subject headings, title, and other descriptive components with high precision, the technology can improve the search and discovery experience for library users. as with any machine learning model, though, there is a risk of bias that needs to be considered when utilizing information technology and libraries september 2023 from chatgpt to catgpt 7 brzustowicz chatgpt. therefore, it is crucial to monitor and evaluate the training data used to build the model, to ensure that it is extensive, impartial, and representative. the use of dublin core to describe original works is another factor that should be taken into consideration when using chatgpt for cataloging. the model’s potential to generate accurate metadata records is reliant on the quality of the input data, including the use of standardized vocabularies like dublin core. the use of any metadata standard, such as dublin core, may introduce biases and errors that could affect quality of chatgpt-generated records. therefore, information professionals should ensure that the metadata standards used by chatgpt are appropriate for the materials being cataloged and that the resulting records are regularly reviewed and evaluated for accuracy and impartiality. chatgpt’s accurate and comprehensive metadata records have the potential to enhance the effectiveness of library cataloging systems and improve the search and discovery experience for library users. however, as with any machine learning model, there is a risk of bias that needs to be considered when utilizing chatgpt. to ensure error-free results, librarians and other information professionals should approach chatgpt’s application systematically, by monitoring and evaluating the training data used to develop the model’s capabilities and by regularly curating and updating those data. additionally, periodic inspection and amendment of the generated records may be necessary to avoid inaccuracies and discrepancies arising from biases in the training data. by taking a careful and deliberate approach to chatgpt’s use, we can ensure that the resulting metadata records are as accurate, comprehensive, and unbiased as possible, while also being mindful of the intellectual property considerations inherent in using a machine learning model to generate content. information technology and libraries september 2023 from chatgpt to catgpt 8 brzustowicz tables table 1. chatgpt-generated marc record for the 1996 reprint of interview with the vampire by anne rice field data ldr 01110cam a2200325 a 4500 001 2740930 005 20040429112254.0 008 960304s1996 nyu 000 1 eng d 010 96011406 020 0345337662 (pbk.) 040 dlc $c dlc $d dlc 100 1 rice, anne. 245 1 0 interview with the vampire / $c by anne rice. 260 new york : $b ballantine books, $c 1996. 300 372 p. ; $c 18 cm. 336 text $b txt $2 rdacontent 337 unmediated $b n $2 rdamedia 338 volume $b nc $2 rdacarrier 650 0 vampires $v fiction. 650 0 horror tales. 650 0 vampires $v fiction. 650 0 horror tales. information technology and libraries september 2023 from chatgpt to catgpt 9 brzustowicz table 2. worldcat marc record for the 1996 reprint of interview with the vampire by anne rice field data ldr c a22 0 4500 001 1052676753 005 20220606044740.3 008 960903r19761996nyua b 001 0 eng 010 $z 75036792 020 9780394498218 (hardcover) ; 0394498216 (hardcover) 040 ntg $b eng $c ntg $d oclco $d oclcf $d oclcq $d oclco $d oclca 049 crcb 050 4 ps3568.i265 $b i5 1996 082 1 4 [e] 082 0 4 fic rice 090 $b 100 1 rice, anne, $d 1941-2021. 245 1 0 interview with the vampire / $c by anne rice. 250 anniversary ed. 260 new york : $b knopf, $c 1996. 300 340 pages ; $c 22 cm 336 text $b txt $2 rdacontent 337 unmediated $b n $2 rdamedia 338 volume $b nc $2 rdacarrier 500 “published may 5, 1976 ... reprinted eleven times”--title... 520 0 in a remote room in a large city, a young reporter sits fa... 600 0 lestat $c (vampire), $d 1760$v fiction. 650 0 vampires $v fiction. 650 0 imaginary interviews. 650 6 vampires $v romans, nouvelles, etc. 650 6 entretiens imaginaires. 600 0 lestat $c (vampire), $d 1760$2 fast $0 (ocolc)fst00996665 650 7 imaginary interviews. $2 fast $0 (ocolc)fst00967566 650 7 vampires. $2 fast $0 (ocolc)fst01163968 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 horror fiction. $2 fast $0 (ocolc)fst01921684 655 7 horror fiction. $2 lcgft information technology and libraries september 2023 from chatgpt to catgpt 10 brzustowicz table 3. chatgpt-generated marc record for low by david bowie (vinyl) field description ldr 01523njm a2200373 i 4500 001 ocn1029184931 005 20220130152943.0 007 sd fsngnn|||eu 008 180430s2018 xx umm n eng d 024 1 00602557675302 028 2 lcrhlp14 028 3 lcrhcd14 040 ##$adlc$beng$cdlc$doclco$docolc 049 ##$aocoz 100 1 bowie, david,$eperformer. 245 10 low$h[sound recording] /$cdavid bowie. 264 31 [place of publication not identified] :$b[label not identified],$c[2018] 300 ##$a1 sound disc :$banalog, 33 1/3 rpm ;$c12 in. 336 ##$anotated music$bntm$2rdacontent 337 ##$audio$sanc$2rdamedia 338 ##$aaudio disc$bac$2rdacarrier 500 ##$aoriginally released in 1977. 500 ##$atitle from container. 500 ##$alyrics and full credits on container insert. 511 0 david bowie, vocals, saxophones, guitars, piano, and chamberlin ; brian eno, synthesizer, guitar treatments, and piano ; carlos alomar, rhythm guitar ; george murray, bass guitar ; dennis davis, drums and percussion. 518 ##$arecorded august-september 1976 at the château d’hérouville, hérouville, france. 650 _0 $arock music$y1971-1980. 650 _0 $arock music$y2011-2020. 650 _0 $arock music$zengland. 650 _0 $avinyl lps. 700 1 eno, brian,$eperformer. 700 1 alomar, carlos,$eperformer. 700 1 murray, george,$eperformer. 700 1 davis, dennis,$eperformer. 730 0 low. 902 ##$amusic$new. information technology and libraries september 2023 from chatgpt to catgpt 11 brzustowicz table 4. worldcat marc record for low by david bowie (vinyl) field data ldr 02213cjm a2200529 i 4500 005 20220317111101.1 008 180321t20182017enkrcna zxx d 040 ## $asfr $beng $erda $csfr $doclcf $doclca $doclcq $doclco 024 3 $a0190295842918 028 02 $a0190295842918 $bparlophone 049 ## $acrcb 100 1 $abowie, david, $ecomposer, performer, producer. 245 10 $alow / $cdavid bowie. 264 #1 $a[london] : $bparlophone records, $c[2018]. 264 #4 $c℗2017 300 ## $a1 audio disc ; $c12 in. 336 ## $aperformed music $bprm $2rdacontent 337 ## $aaudio $bs $2rdamedia 338 ## $aaudio disc $bsd $2rdacarrier 340 ## $avinyl 344 ## $aanalog $2rdatr 344 ## $c33 1/3 rpm 344 ## $dmicrogroove 344 ## $gstereo $2rdacpc 500 ## $areissue, remastered, originally released in 1977. 508 ## $aproduced by david bowie and tony visconti. 511 0 $adavid bowie ; with accompanying musicians. 505 0 $tspeed of life -$tbreaking glass -$twhat in the world -$tsound and vision - $talways crashing in the same car -$tbe my wife -$ta new career in a new town $twarszawa -$tart decade -$tweeping wall -$tsubterraneans. 538 ## $avinyl disc, double-sided; turntable required. 650 0 $arock music $y1971-1980. 650 0 $apopular music $y1971-1980. 650 0 $aprogressive rock music. 650 6 $arock (musique) $y1971-1980. 650 6 $amusique populaire $y1971-1980. 650 6 $arock progressif. 650 7 $apopular music. $2fast $0(ocolc)fst01071422 information technology and libraries september 2023 from chatgpt to catgpt 12 brzustowicz field data 650 7 $aprogressive rock music. $2fast $0(ocolc)fst01078743 650 7 $arock music. $2fast $0(ocolc)fst01099204 648 7 $a1971-1980 $2fast 655 7 $apopular music. $2fast $0(ocolc)fst01726706 655 7 $aprogressive rock (music) $2fast $0(ocolc)fst01726709 655 7 $apopular music. $2lcgft 655 7 $aprogressive rock (music) $2lcgft 655 7 $amusique populaire. $2rvmgf 655 7 $arock progressif. $2rvmgf 700 1 $avisconti, tony, $eproducer. information technology and libraries september 2023 from chatgpt to catgpt 13 brzustowicz table 5. chatgpt-generated marc record for the 1971 german edition of pedagogy of the oppressed by paolo freire field description ldr 00000nam a2200000 4500 001 ocm04045077 003 ocolc 005 2023-02-23 008 711215s1971 gw 000 0 ger 020 $a9783801200059 020 $a3801200054 035 $a(ocolc)40053172 040 $adlc$cdlc$docolc$doclcf 041 1 $ager$hpor 042 $apcc 043 $ae-gx-- 050 00 $alb875.p442 100 1$afreire, paulo. 240 10 $apedagogy of the oppressed.$lgerman 245 10 $apädagogik der unterdrückten /$cpaulo freire ; aus dem portugiesischen von walter jens. 260 $astuttgart :$bklett-cotta verlag,$c1971. 300 $a218 pages ;$c21 cm 336 $atext$btxt$2rdacontent 337 $aunmediated$bn$2rdamedia 338 $avolume$bnc$2rdacarrier 546 $ain german. 650 0 $aeducation$xphilosophy. 650 0 $apopular education$zbrazil. 650 0 $asocial change. 651 0 $abrazil$xsocial conditions. 700 1 $ajens, walter,$d1923-2013,$etranslator. information technology and libraries september 2023 from chatgpt to catgpt 14 brzustowicz table 6. worldcat marc record for the 1971 german edition of pedagogy of the oppressed by paolo freire field description ldr cam mi 001 63439859 005 20220106164408.5 008 830119s1971 gw 000 0 ger d 040 nlggc $b dut $e fobidrtb $c nlggc $d hebis $d debbg $d oclcq $d oclco 015 72,a06,0123 $2 dnb 016 7 720015707 $2 de-101 020 3783103746 020 9783783103748 041 1 1 ger $h por 084 5,3 $2 ssgn 084 bu 1410 $2 rvk 084 dd 6455 $2 rvk 084 dt 1000 $2 rvk 090 $b 049 crcb 100 1 freire, paulo, $d 1921 245 10 pädagogik der unterdrückten / $c paulo freire ; mit einer einf. von ernst lange ; [autor. übertr. aus dem portugesischen]. 260 stuttgart : $b kreuz-verlag, $c 1971. 300 218 p. ; $c 21 cm 650 04 onderdrukking. 650 04 pedagogiek. 650 7 bildungspolitik. $2 fes 650 7 bildungsziel. $2 fes 650 7 erziehung. $2 fes 650 7 pädagogik $2 gnd 650 7 unterdrückung $2 gnd 650 7 alphabetisierung $2 gnd 650 7 politische bildungsarbeit $2 gnd 650 7 emanzipatorische erziehung $2 gnd 651 7 brasilien $2 gnd 650 17 pedagogiek. $2 gtt information technology and libraries september 2023 from chatgpt to catgpt 15 brzustowicz field description 650 17 onderdrukking. $2 gtt 655 4 unterdrückte. 655 7 unterdrückte. $2 swd 029 0 nlggc $b 830230688 029 1 debbg $b bv005751307 029 1 debsz $b 003425525 029 1 hebis $b 045711100 information technology and libraries september 2023 from chatgpt to catgpt 16 brzustowicz table 7. chatgpt-generated for the 2018 print edition in russian of the three body problem by cixin liu field data ldr cam i 001 1037948853 005 20220402070218.7 008 180529s2018 ru 000 1 rus d 040 cle $b eng $e rda $c cle $d oclcf $d ocl $d oclcq $d oclco 066 $c (n 020 9785040891122 020 5040891121 041 1 $h chi $a rus 043 a-cc-- 090 $b 049 crcb 100 1 $a liu, cixin, $e author. 240 10 $a san ti. $l russian 245 10 $a задача трех тел / $c лю цысинь. 245 10 $a zadacha trekh tel / $c li︠u ︡ t︠s ︡ysinʹ. 264 1 $a москва : $b э, $c 2018. 264 1 $a moskva : $b ė, $c 2018. 300 $a 462 pages ; $c 22 cm 336 $a text $b txt $2 rdacontent 337 $a unmediated $b n $2 rdamedia 338 $a volume $b nc $2 rdacarrier 490 0 $a sci-fi universe 650 0 human-alien encounters, fiction 650 0 imaginary wars and battles, fiction 651 0 china -history -cultural revolution, 1966-1976 -fiction 650 6 rencontres avec les extraterrestres, romans, nouvelles, etc. 650 6 guerres et batailles imaginaires, romans, nouvelles, etc. 651 6 chine -histoire -1966-1976 (révolution culturelle) -romans, nouvelles, etc. 650 7 human-alien encounters. $2 fast $0 (ocolc)fst00963475 650 7 imaginary wars and battles. $2 fast $0 (ocolc)fst00967580 651 7 china. $2 fast $0 (ocolc)fst01206073 647 7 cultural revolution (china : 1966-1976) $2 fast $0 (ocolc)fst01352662 information technology and libraries september 2023 from chatgpt to catgpt 17 brzustowicz field data 648 7 1966-1976 $2 fast 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 history. $2 fast $0 (ocolc)fst01411628 655 7 science fiction. $2 fast $0 (ocolc)fst01726489 655 7 science fiction. $2 lcgft 700 1 translation of: liu, cixin. san ti. information technology and libraries september 2023 from chatgpt to catgpt 18 brzustowicz table 8. worldcat record for the 2018 print edition in russian of the three body problem by cixin liu field data ldr cam i 001 1037948853 005 20220402070218.7 008 180529s2018 ru 000 1 rus d 040 cle $b eng $e rda $c cle $d oclcf $d ocl $d oclcq $d oclco 066 $c (n 020 9785040891122 020 5040891121 041 1 $h chi $a rus 043 a-cc-- 090 $b 049 crcb 100 1 $a liu, cixin, $e author. 240 10 $a san ti. $l russian 245 10 $a задача трех тел / $c лю цысинь. 245 10 $a zadacha trekh tel / $c li︠u ︡ t︠s ︡ysinʹ. 264 1 $a москва : $b э, $c 2018. 264 1 $a moskva : $b ė, $c 2018. 300 $a 462 pages ; $c 22 cm 336 $a text $b txt $2 rdacontent 337 $a unmediated $b n $2 rdamedia 338 $a volume $b nc $2 rdacarrier 490 0 $a sci-fi universe 650 0 human-alien encounters, fiction 650 0 imaginary wars and battles, fiction 651 0 china -history -cultural revolution, 1966-1976 -fiction 650 6 rencontres avec les extraterrestres, romans, nouvelles, etc. 650 6 guerres et batailles imaginaires, romans, nouvelles, etc. 651 6 chine -histoire -1966-1976 (révolution culturelle) -romans, nouvelles, etc. 650 7 human-alien encounters. $2 fast $0 (ocolc)fst00963475 650 7 imaginary wars and battles. $2 fast $0 (ocolc)fst00967580 651 7 china. $2 fast $0 (ocolc)fst01206073 647 7 cultural revolution (china : 1966-1976) $2 fast $0 (ocolc)fst01352662 information technology and libraries september 2023 from chatgpt to catgpt 19 brzustowicz field data 648 7 1966-1976 $2 fast 655 7 fiction. $2 fast $0 (ocolc)fst01423787 655 7 history. $2 fast $0 (ocolc)fst01411628 655 7 science fiction. $2 fast $0 (ocolc)fst01726489 655 7 science fiction. $2 lcgft 700 1 translation of: liu, cixin. san ti. information technology and libraries september 2023 from chatgpt to catgpt 20 brzustowicz table 9. chatgpt’s original marc record for “pathos y lagrimas” by mood rings field data ldr 02461cjm a2200517 a 4500 001 123456789 005 20181025171247.0 007 sd fungnnmmned 008 181025s2013 nyurcn n zxx d 024 8 767870622805 035 (ocolc)851415773 040 dlc $b eng $c dlc $e rda $d dlc 041 0 $g spa 043 n-mx-- 047 vd $a pp 048 ka01 049 dlc 050 00 m1630.18 $b .m66 2013 082 00 782.421660972 100 1 mood rings $4 prf 245 10 pathos y lagrimas / $c mood rings. 246 3 333 264 1 [brooklyn, ny] : $b mexican summer, $c [2013] 300 1 audio disc : $b analog, 33 1/3 rpm ; $c 7 in. 336 performed music $b prm $2 rdacontent 337 audio $b s $2 rdamedia 338 audio disc $b sd $2 rdacarrier 500 title from label. 500 b-side label has number “333.” 511 0 mood rings. 500 “limited edition” “500 copies.” 650 0 rock music $z mexico $y 2011-2020. 655 7 rock music. $2 lcgft 655 7 popular music. $2 lcgft 700 1 mood rings. $4 prf 710 2 mexican summer (firm) $4 pbl. information technology and libraries september 2023 from chatgpt to catgpt 21 brzustowicz table 10. chatgpt’s original record using the dublin core for “pathos y lagrimas” by mood rings field data title pathos y lagrimas / mood rings creator mood rings contributor mexican summer date 2013 type sound format audio/vinyl identifier 333 language eng relation mexican summer source mexican summer coverage atlanta, ga rights all rights reserved information technology and libraries september 2023 from chatgpt to catgpt 22 brzustowicz endnotes 1 viriya taecharungroj, “what can chatgpt do? analyzing early reactions to the innovative ai chatbot on twitter,” big data and cognitive computing 7, no. 1: 35, https://doi.org/10.3390/bdcc7010035. gpt (generative pretrained transformer) technologies are structured in a way that involves training language models on a large body of data. subsequently, fine-tuning processes are applied to enhance the model’s performance on specific tasks and domains. fine-tuning processes are then applied to enhance the model’s performance on specific tasks and domains. 2 roberto gozalo-brizuela and eduardo c. garrido-merchan, “chatgpt is not all you need. a state of the art review of large generative ai models,” arxiv:2301.04655v1 [cs.lg]: 15, https://doi.org/10.48550/arxiv.2301.04655. 3 taecharungroj, “what can chatgpt do?” 4 enkelejda kasneci et al., “chatgpt for good? on opportunities and challenges of large language models for education,” edarxiv (january 30, 2023), https://doi.org/10.35542/osf.io/5er8f. 5 kasneci et al., “chatgpt for good?” 6 brady d. lund and ting wang, “chatting about chatgpt: how may ai and gpt impact academia and libraries?” library hi tech news 40 (2023), no. 3: 26–29, https://doi.org/10.1108/lhtn01-2023-0009. 7 stephen atlas, “chatgpt for higher education and professional development: a guide to conversational ai,” (2023): 106–7, https://digitalcommons.uri.edu/cba_facpubs/548/. 8 ali borji, “a categorical archive of chatgpt failures,” arxiv.2302.03494 [cs.cl]: 11, https://doi.org/10.48550/arxiv.2302.03494. 9 anne rice, interview with the vampire (new york: alfred k. knopf, 1996). 10 david bowie, low, recorded september–october 1976, rca victor, 1977, vinyl lp. 11 paolo freire, pädagogik der unterdrückten (stuttgart: kreuz-verlag, 1971). 12 cixin liu, задача трех тел (moscow: sci-fi universe, 2018). 13 mood rings, “pathos y lagrimas,” recorded ca. 2013, mexican summer, 2013, vinyl single. 14 tom b. brown et al., “language models are few-shot learners,” arxiv.2005.14165v4 [cs.cl]: 8–9, https://doi.org/10.48550/arxiv.2005.14165. https://doi.org/10.3390/bdcc7010035 https://doi.org/10.48550/arxiv.2301.04655 https://doi.org/10.48550/arxiv.2301.04655 https://doi.org/10.35542/osf.io/5er8f https://doi.org/10.1108/lhtn-01-2023-0009 https://doi.org/10.1108/lhtn-01-2023-0009 https://digitalcommons.uri.edu/cba_facpubs/548/ https://doi.org/10.48550/arxiv.2302.03494 https://doi.org/10.48550/arxiv.2005.14165 abstract introduction literature review methodology results bias discussion conclusion tables endnotes editorial board thoughts | farnel 169 t his past spring, my alma mater, the school of library and information studies (slis) at the university of alberta, restructured the it component of its mlis program. as a result, as of september 2010, incoming students are expected to possess certain basic it skills before beginning their program.1 these skills include the following: ■■ comprehension of the components and operations of a personal computer ■■ microsoft windows file management ■■ proficiency with microsoft office (or similar) products, including word processing and presentation software ■■ use of e-mail ■■ basic web browsing and searching this new requirement got me thinking: is this common practice among ala-accredited library schools? if other schools are also requiring basic it skills prior to entry, how do those required by slis compare? so i thought i’d do a little investigating to see what others in “library school land” are doing. before i continue, a word of warning: this was by no means a rigorous scientific investigation, but rather an informal survey of the landscape. i started my investigation with ala’s directory of institutions offering accredited master’s programs.2 there are fifty-seven institutions listed in the directory. i visited each institution’s website and looked for pages describing technology requirements, computer-competency requirements, and the like. if i wasn’t able to find the desired information after fifteen or twenty minutes, i would note “nothing found” and move on to the next. in the end i found some sort of list of technology or computer-competency requirements on thirty-three (approximately 58 percent) of the websites. it may be the case that such a list exists on other sites and i didn’t find it. i should also note that five of the lists i found focus more on software and hardware than on skills in using said software and hardware. even considering these conditions, however, i was somewhat surprised at the low numbers. is it simply assumed that today’s students already have these skills? or is it expected that they will be picked up along the way? i don’t claim to know the answers, and discovering them would require a much more detailed and thorough investigation, but they are interesting questions nonetheless. once i had found the requirements, i examined them in some detail to get a sense of the kinds of skills listed. while i won’t enumerate them all, i did find the most common ones to be similar to those required by slis— basic comfort with a personal computer and proficiency with word processing and presentation software, e-mail, file management, and the internet. a few (5) schools also list comfort with local systems (e-mail accounts, online courseware, etc.). several (7) schools mention familiarity with basic database design and functionality, while a few (5) list basic web design. very few (3) mention competency with security tools (firewalls, virus checkers, etc.), and just slightly more (4) mention familiarity with web 2.0 tools like blogs, wikis, etc. while many (14) specifically mention searching under basic internet skills, few (7) mention proficiency with opacs or other common information tools such as full-text databases. interestingly, one school has a computer programming requirement, with mentions of specific acceptable languages, including c++, pascal, java, and perl. but this is certainly the exception rather than the rule. i was encouraged that there seems to be a certain agreement on the basics. but i was a little surprised at the relative rarity of competency with wikis and blogs and all those web 2.0 tools that are so often used and talked about in today’s libraries. is this because there is still some uncertainty as to the utility of such tools in libraries? or is it because of a belief that the members of the millennial or “digital” generation are already expert in using them? i don’t know the reasons, but it is interesting to ponder nonetheless. i was also surprised that a level of information literacy isn’t listed more often, particularly given that we’re talking about slis programs. i do know, of course, that many of these skills will be developed or enhanced as students work their way through their programs, but it also seems to me that there is so much other material to learn that the more that can be taken care of beforehand, the better. librarians work in a highly technical and technological environment, and this is only going to become even more the case for future generations of librarians. certainly, basic familiarity with a variety of applications and tools and comfort with rapidly changing technologies are major assets for librarians. in fact, ala recognizes the importance of “technological knowledge and skills” as core competencies of librarianship. specifically mentioned are the following: ■■ information, communication, assistive, and related technologies as they affect the resources, service delivery, and uses of libraries and other information agencies. ■■ the application of information, communication, assistive, and related technology and tools consistent with professional ethics and prevailing service norms and applications. ■■ the methods of assessing and evaluating the sharon farnel editorial board thoughts: system requirements sharon farnel (sharon.farnel@ualberta.ca) is metadata & cataloguing librarian at the university of alberta in edmonton, alberta, canada. 170 information technology and libraries | december 2010 references 1. university of alberta school of library and information studies, “degree requirements: master of library & information studies,” www.slis.ualberta.ca/mlis_degree_requirements.cfm (accessed aug. 5, 2010). 2. american library association office for accreditation, “library & information studies directory of institutions offering accredited master’s programs 2008–2009,” 2008, http:// ala.org/ala/educationcareers/education/accreditedprograms/ directory/pdf/lis_dir_20082009.pdf (accessed aug. 5, 2010). 3. american library association, “ala’s core competences of librarianship,” january 2009, www.ala.org/ala/education careers/careers/corecomp/corecompetences/finalcorecomp stat09.pdf (accessed aug. 5, 2010). specifications, efficacy, and cost efficiency of technology-based products and services. ■■ the principles and techniques necessary to identify and analyze emerging technologies and innovations in order to recognize and implement relevant technological improvements.3 given what we know about the importance of technology to librarians and librarianship, my investigation has left me with two questions: (1) why aren’t more library schools requiring certain it skills prior to entry into their programs? and (2) are those who do require them asking enough of their prospective students? i hope you, our readers, might ask yourselves these questions and join us on italica for what could turn out to be a lively discussion. practical limits to the scope of digital preservation mike kastellec practical limits to the scope of digital preservation | kastellec 63 abstract this paper examines factors that limit the ability of institutions to digitally preserve the cultural heritage of the modern era. the author takes a wide-ranging approach to shed light on limitations to the scope of digital preservation. the author finds that technological limitations to digital preservation have been addressed but still exist, and that non-technical aspects—access, selection, law, and finances—move into the foreground as technological limitations recede. the author proposes a nested model of constraints to the scope of digital preservation and concludes that costs are digital preservation’s most pervasive limitation. introduction imagine for a moment what perfect digital preservation would entail: a perfect archive would capture all the content generated by humanity instantly and continuously. it would catalog that information and make it available to users, yet it would not stifle creativity by undermining creators’ right to control their creations. most of all, it would perfectly safeguard all the information it ingested eternally, at a cost society is willing and able to sustain. now return to reality: digital preservation is decidedly imperfect. today’s archives fall far short of the possibilities outlined above. much previous scholarship debates the quality of different digital preservation strategies; this paper looks past these arguments to shed light on limitations to the scope of digital preservation. what are the factors that limit the ability of libraries, archives, and museums (henceforth collectively referred to as archival institutions) to digitally preserve the cultural heritage of the modern era? 1 i first examine the degree to which technological limitations to digital preservation have been addressed. next, i identify the non-technical factors that limit the archival of digital objects. finally, i propose a conceptual model of limitations to digital preservation. technology any discussion of digital preservation naturally begins with consideration of the limits of digital preservation technology. while all aspects of digital preservation are by definition related to technology, there are two purely technical issues at the core of digital preservation: data loss and technological obsolescence. 2 many things can cause data loss. the constant risk is physical deterioration. a digital file consists at its most basic level as binary code written to some form of mike kastellec (makastel@ncsu.edu) is libraries fellow, north carolina state university libraries, raleigh, nc. mailto:makastel@ncsu.edu information technology and libraries | june 2012 64 physical media. just like analog media (paper, vinyl recordings), digital media (optical discs, hard drives) are subject to degradation at a rate determined by the inherent properties of the medium and environment in which it is stored. 3 when the physical medium of a digital file decays to the point where one or more bits lose their definition, the file becomes partially or wholly unreadable. other causes of data loss include software bugs, human action (e.g., accidental deletion or purposeful alteration), and environmental dangers (e.g., fire, flood, war). assuming a digital archive can overcome the problem of physical deterioration, it then faces the issue of technological obsolescence. binary code is simply a string of zeroes and ones (sometimes called a bitstream)—like any encoded information, this code is only useful if it can be decoded into an intelligible format. this process depends on hardware, used to access a bitstream from a piece of physical media, and software, which decodes the bitstream into an intelligible object, such as a document or video displayed on a screen, a printout, or an audio output. technological obsolescence occurs when either the hardware or software needed to render a bitstream usable is no longer available. given the rapid pace of change in computer hardware and software, technological obsolescence is a constant concern. 4 most digital preservation strategies involve staying ahead of deterioration and obsolescence by copying data from older to current generations of file formats and storage media (migration) or by keeping many copies that are tested against one another to find and correct errors (data redundancy). 5 other strategies to overcome obsolescence include pre-emptively converting data to standardized formats (normalization) or avoiding conversion and instead using virtualized hardware and software to simulate the original digital environment needed to access obsolete formats (emulation). as may be expected of a young field, 6 there is a great deal of debate over the merits of each of these strategies. to date, the arguments mostly concern the quality of preservation, which is beyond the scope of this work. what should not be contentious is that each strategy also imposes limitations on the potential scale of digital preservation. migration and normalization are intensive processes, in the sense that they normally require some level of human interaction. any human-mediated process limits the scale of an archival institution’s preservation activities, as trained staffs are a limited and expensive resource. emulation postpones the processing of data until it is later accessed, potentially allowing greater ingest of information. as a strategy, however, it remains at least partly theoretical and untested, increasing the possibility that future access will be limited. data redundancy deserves closer examination, as it has emerged as the gold standard in recent years. the limitations data redundancy imposes on digital preservation are two-fold. the first is that simple maintenance of multiple copies necessarily increases expenses, therefore—given equal levels of funding—less information can be preserved redundantly than can be preserved without such measures. (cost considerations are inextricably linked to every other limitation on digital preservation and are examined in greater detail in “finances,” below.) there are practical, technical limitations on the bandwidth, disk access, and processing speeds needed to perform practical limits to the scope of digital preservation | kastellec 65 parity checks (tests of each bit’s validity) of large datasets to guard against data loss. pushing against these limitations incurs dramatic costs, limiting the scale of digital preservation. current technology and funding are many orders of magnitude short of what is required to archive the amount of information desired by society over the long term. 7 the second way technology limits digital preservation is more complex—it concerns error rates of archived data. non-redundant storage strategies are also subject to errors, of course. only redundant systems have been proposed as a theoretical solution to the technological problem of digital preservation, 8 though, so it is necessary to examine their error rate in particular. on a theoretical level, given sufficient copies, redundant backup is all but infallible. in practice, technological limitations emerge. 9 the number of copies required to ensure perfect bit preservation is a function of the reliability of the hardware storing each copy. multiple studies have found that hardware failure rates greatly exceed manufacturers’ claims. 10 rosenthal argues that, given the extreme time spans under consideration, storage reliability is not just unknown but untestable. 11 he therefore concludes that it cannot be known with certainty how many copies are needed to sustain acceptably low error rates. even today’s best digital preservation technologies are subject to some degree of loss and error. analog materials are also inevitably subject to deterioration, of course, but the promise of digital media leads many to unrealistic expectations of perfection. nevertheless, modern digital preservation technology addresses the fundamental needs of archival institutions to a workable degree. technological limitations to digital preservation still exist but the aspects of digital preservation beyond purely technical considerations—access, selection, law, and finances— should gain greater relative importance than they have in the past. access with regard to digital preservation, there are two different dimensions of access that are important. at one end of a digital preservation operation, authorized users must be able to access an archival institution’s holdings and unauthorized users restricted from doing so. this is largely a question of technology and rights management—users must be able to access preserved information and permitted to do so. this dimension of access is addressed in the technology and law sections of this paper. the other dimension of access occurs at the other end of a digital preservation operation: an archival institution must be able to access a digital object to preserve it. this simple fact leads to serious restrictions on the scope of digital preservation because much of the world’s digital information is inaccessible for the purposes of archiving by libraries and archives. there are a number of reasons why a given digital object may be inaccessible. large-scale harvesting of webpages requires automated programs that “crawl” the web, discovering and capturing pages as they go. web crawlers cannot access password-protected sites (e.g., facebook) and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, information technology and libraries | june 2012 66 and countless collections of data). this inaccessible portion of the web is estimated to dwarf the readily accessible portion by orders of magnitude. there is also an enormous amount of inaccessible digital information that is not part of the web at all, such as emails, company intranets, and digital objects created and stored by individuals. 12 additionally, there is a temporal limit to access. some digital objects only are accessible (or even exist) for a short window of time, and all require some measure of active preservation to avoid permanent loss. 13 the lifespans of many webpages are vanishingly short. other pages, like some news items, are publicly accessible for a short window before they are hidden behind paywalls. even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change with each visit; news articles and other documents are revised; blog posts and comments are deleted. if an archival institution cannot access a digital object quickly or frequently enough, the object cannot be archived, at least not completely. large-scale digital preservation, which in practice necessarily relies on periodic automated harvesting of content, is therefore limited to capturing snapshots of the changes digital objects undergo over their lifespans. law existing copyright law does not translate well to the digital realm. leaving aside the complexities of international copyright law, in the united states it is not clear, for example, whether an archival institution like the library of congress is bound by licensing restrictions and if it can require deposit of digital objects, nor whether content on the web or in databases should be treated as published or unpublished. 14 “many of the uncertainties come from applying laws to technologies and methods of distribution they were not designed to address.” 15 a lack of revised laws or even relevant court decisions significantly impacts the potential scale of digital preservation, as few archival institutions will venture to preserve digital objects without legal protection for doing so. given this unclear legal environment, efforts at large-scale digital preservation are hampered by the need to secure permission to archive from the rights holder of each piece of content. 16 this obviously has enormous impact on preserving the web, but even scholarly databases and periodical archives may not hold full rights to all of their published content. additionally, a single digital object can include content owned by any number of authors, each of whose permission is needed for legal archival. without stronger legal protection for archival institutions, the scope of digital preservation is severely limited by copyright restrictions. digital preservation is further limited by licensing agreements, which can be even more restrictive than general copyright law. frequently, purchase of a digital object does not transfer ownership to the end-user, but rather grants limited licensed access to the object. in this case, libraries do not enjoy the customary right of first sale that, among other things, allows for actions related to preservation that would otherwise breach copyright. 17 preservation of licensed works requires that libraries either cede archival responsibility to rights practical limits to the scope of digital preservation | kastellec 67 holders, negotiate the right to archive licensed copies, or create dark archives that preserve objects in an inaccessible state until their copyright expires. selection the limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. the total digital content created each year already outstrips the total current storage capacity of the world by a wide margin. 18 it is clear libraries and archives cannot preserve everything so, more than ever, deciding what to preserve is critical. 19 models of selection for digital objects can be plotted on a scale according to the degree of human mediation they entail. at one end, the selective model is closest to selection in the analog world, with librarians individually identifying digital objects worthy of digital preservation. at the other end of the scale, the whole domain model involves minimal human-mediation, with automated harvesting of digital objects. the collaborative model, in which archival institutions negotiate agreements with publishers to deposit content, falls somewhere between these two extremes, as does the thematic model, which can apply either selectiveor whole-domain-type approaches to relatively narrow sets of digital objects defined by event, topic, or community. each of these approaches results in limits to the scope of digital preservation. the human mediation of the selective model limits the scale of what can be preserved, as objects can only be acquired as quickly as staff can appraise them. the collaborative and thematic models offer the potential for thorough coverage of their target but by definition are limited in scope. the whole domain model avoids the bottleneck of human appraisal but, more than any other model, is subject to the access limitations discussed above. whole domain harvesting is also essentially wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential value. this wastefulness makes the whole domain model extremely expensive because of the technological resources required to manage information at such a scale. finances the ultimate limiting factor is financial reality. considerations of funding and cost have both broad and narrow effects. the narrow effects are on each of the other limitations previously identified— financial constraints are intertwined with the constraints imposed by technology, access, law, and selection. the technological model of digital preservation that offers the highest quality and lowest risk, redundant offsite copies, also carries hard-to-sustain costs. while the cost of storage continues to drop, hardware costs actually make up only a small percentage of the total cost of digital preservation. power, cooling, and—for offsite copy strategies—bandwidth costs are significant and do not decrease as scale increases to the same degree that storage costs do. cost considerations similarly fuel non-technical limitations: increased funding can increase the rate at which digital objects are accessed for preservation and can enable development of systems to mine deep web resources. selection is limited by the number of staff who can evaluate objects or information technology and libraries | june 2012 68 the need to develop systems to automate appraisal. negotiating perpetual access to objects or arranging to purchase archival copies creates additional costs. the broad financial effect is that any digital preservation requires dedicated funding over an indefinite timespan. lavoie outlines the problem: much of the discussion in the digital preservation community focuses on the problem of ensuring that digital materials survive for future generations. in comparison, however, there has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all. 20 there are many possible funding models for digital preservation, 21 each with their own limitations. creators and rights holders can preserve their own content but normally have little incentive to do so over the long-term, as demand for access slackens. publicly funded agencies can preserve content, but they may lack a clear mandate for doing so, and they are chronically underfunded. preservation may be voluntarily funded, as is the case for wikipedia, although it is not clear if there is enough potential volunteer funding for more than a few preservation efforts. fees may support preservation, either through charging users for access or by third-party organizations charging content owners for archival services; in such cases, however, fees may also discourage access or provision of content, respectively. a nested model of limitations these aspects can be seen as a series of nested constraints (see figure 1). practical limits to the scope of digital preservation | kastellec 69 figure 1. nested model of limitations at the highest level, there are technical limitations on how much digital information can be preserved at an acceptable quality. within that constraint, only a limited portion of what could possibly be preserved can be accessed by archival institutions for digital preservation. next, within that which is accessible, there are legal limitations on what may be archived. the subset defined by technological, access, and legal limitations still holds far more information than archival institutions are capable of archiving, therefore selection is required, entailing either the limited quality of automated gathering or the limited quantity of human-mediated appraisal. finally, each of these constraints is in turn limited by financial considerations, so finances exert pressure at each level. conclusion it is possible to envision alternative ways to model these series of constraints—the order could be different, or they could all be centered on a single point but not nested within each other. thus, undue attention should not be given to the specific sequence outlined above. one important conclusion that may be drawn, however, is that the identified limitations are related but distinct. the preponderance of digital preservation research to date has understandably focused on overcoming technological limitations. with the establishment of the redundant backup model, which addresses technological limitations to a workable degree, the field would be well served by greater efforts to push back the non-technical limitations of access, law, and selection. the other conclusion is that costs are digital preservation’s most pervasive limitation. as rosenthal plainly states it, “society’s ever-increasing demands for vast amounts of data to be kept for the future are information technology and libraries | june 2012 70 not matched by suitably lavish funds.” 22 if funding cannot be increased, expectations must be tempered. perhaps it has always been the case, but the scale of the digital landscape makes it clear that preservation is a process of triage. for the foreseeable future, the amount of digital information that could possibly be preserved far outstrips the amount that feasibly can be preserved. it is useful to put the advances in digital preservation technology in perspective and to recognize that non-technical factors also play a large role in determining how much of our cultural heritage may be preserved for the benefit of future generations. references and notes 1. issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically addressed herein. technological limitations apply equally to digitized and born-digital objects, however, and the remaining limitations overlap greatly in either case. 2. francine berman et al., sustainable economics for a digital planet: ensuring long-term access to digital information (blue ribbon task force on sustainable digital preservation and access, 2010), http://brtf.sdsc.edu/biblio/brtf_final_report.pdf (accessed apr. 23, 2011). 3. marilyn deegan and simon tanner, “some key issues in digital preservation,” in digital convergence—libraries of the future, ed. rae earnshaw and john vince, 219–37 (london: springer london, 2007), www.springerlink.com.proxyremote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed nov. 18, 2010). 4. berman et al., sustainable economics for a digital planet; deegan and tanner, “digital convergence.” 5. data redundancy normally will also entail hardware migration; it may or may not also incorporate file format migration. 6. the library of congress, for instance, only began digital preservation in 2000 (www.digitalpreservation.gov/partners/pioneers/index.html [accessed apr. 24, 2011]). 7. david s. h. rosenthal, “bit preservation: a solved problem?” international journal of digital curation 5, no. 1 (july 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed mar. 14, 2011). 8. h. m. gladney, “durable digital objects rather than digital preservation,” january 1, 2008, http://eprints.erpanet.org/149 (accessed mar. 14, 2011). 9. rosenthal, “bit preservation.” 10. ibid. rosenthal cites studies by schroeder and gibson (2007) and pinheiro (2007). 11. ibid. http://brtf.sdsc.edu/biblio/brtf_final_report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 http://www.digitalpreservation.gov/partners/pioneers/index.html file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ijdc.net/index.php/ijdc/article/view/151 http://eprints.erpanet.org/149/ practical limits to the scope of digital preservation | kastellec 71 12. peter lyman, “archiving the world wide web,” in building a national strategy for digital preservation: issues in digital media archiving (washington, dc: council on library and information resources and library of congress, 2002), 38–51, www.clir.org/pubs/reports/pub106/pub106.pdf (accessed dec. 1, 2010); f. mccown, c. c marshall, and m. l nelson, “why web sites are lost (and how they’re sometimes found),” communications of the acm 52, no. 11 (2009): 141–45; margaret e. phillips, “what should we preserve? the question for heritage libraries in a digital world,” library trends 54, no. 1 (summer 2005): 57–71. 13. deegan and tanner, “digital convergence”; mccown, marshall, and nelson, “why web sites are lost (and how they’re sometimes found).” 14. june besek, copyright issues relevant to the creation of a digital archive: a preliminary assessment (the council on library and information resources and the library of congress, 2003), www.clir.org/pubs/reports/pub112/contents.html (accessed mar. 15, 2011). 15. ibid., 17. 16. archival institutions that do not pay heed to this restriction, such as the internet archive (www.archive.org), claim their actions constitute fair use. the legality of this claim is as yet untested. 17. berman et al., sustainable economics for a digital planet. 18. francine berman, “got data?” communications of the acm 51, no. 12 (december 2008): 50, http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part =magazine&wanttype=magazines&title=communications (accessed nov. 20, 2010). 19. phillips, “what should we preserve?” 20. brian f. lavoie, “the fifth blackbird,” d-lib magazine 14, no. 3/4 (march 2008): i, www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed mar. 14, 2011). 21. berman et al., sustainable economics for a digital planet. 22. rosenthal, “bit preservation.” http://www.clir.org/pubs/reports/pub106/pub106.pdf file:///c:/users/gerrityr/documents/my%20dropbox/ital/ital_june_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm http://www.archive.org/ http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://www.dlib.org/dlib/march08/lavoie/03lavoie.html http://www.dlib.org/dlib/march08/lavoie/03lavoie.html 170 information technology and libraries | december 2011 this paper summarizes a research program that focuses on how catalogers, other cultural heritage information workers, web/semantic web technologists, and the general public understand, explain, and manage resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within the bibliographic universe and beyond it. a significant effort is made to update the nineteenth-century mathematical and scientific ideas present in traditional cataloging theory to their twentiethand twenty-first-century counterparts. there are two key elements in this approach: (1) a technique for diagrammatically depicting and manipulating large quantities of individual and grouped bibliographic entities and the relationships between them, and (2) the creation of resource description exemplars (problem–solution sets) that are intended to play theoretical, pedagogical, and it system design roles. to the reader: this paper presents a major re-visioning of cataloging theory, introducing along the way a technique for depicting diagrammatically large quantities of bibliographic entities and the relationships between them. as many details of the diagrams cannot be reproduced in regularly sized print publications, the reader is invited to follow the links provided in the endnotes to pdf versions of the figures. c ataloging—the systematic arrangement of resources through their descriptions that is practiced by libraries, archives, and museums (i.e., cultural heritage institutions) and other parties1—can be placed in an advanced, twenty-first-century context by updating its preexisting scientific and mathematical ideas with their more contemporary versions. rather than directing our attention to implementation-oriented details such as metadata formats, database designs, and communications protocols, as do technologists pursuing bottom-up web and semantic web initiatives, in ronald j. murray and barbara b. tillett cataloging theory in search of graph theory and other ivory towers object: cultural heritage resource description networks this paper we will define a complementary, top-down approach. this top-down approach focuses on how catalogers, other cultural heritage information workers, web/ semantic web technologists, and the general public have understood, explained, and managed their resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within and beyond the bibliographic universe. we go on to prescribe what enlargements of cataloging theory and practice are required such that catalogers and other interested parties can describe pages from unique, ancient codices as readily as they might describe information elements and patterns on the web. we will be enhancing cataloging theory with concepts from communications theory, history of science, graph theory, computer science, and from the hybrid field of anthropology and mathematics called ethnomathematics. employing this strategy benefits two groups: ■■ workers in the cultural heritage realm, who will acquire a broadened perspective on their resource description activities, who will be better prepared to handle new forms of creative expressions as they appear, and who will be able to shape the development of information systems that support more sophisticated types of resource descriptions and ways of exploring those descriptions. to build a better library system (perhaps an n-dimensional, n-connected system?), one needs better theories about the library collections and the people or groups who manage and use them. ■■ the full spectrum of people who draw on cultural heritage resources: scholars, creatives (novelists, poets, visual artists, musicians, and so on), professional and technical workers, students, and other people or groups pursuing specific or general, long or short-term interests, entertainment, etc. to apply a multidisciplinary perspective to the processes by which resource description data (linked or otherwise) are created and used is not an ivory tower exercise. our approach draws lessons from the debates on why, what, and how to describe physical phenomena that were conducted by physicists, engineers, software developers (and their historian and philosopher of science observers) during the evolution of high-energy physics. during that time, intensive debates raged over theory and observational/experimental data, the roles of theorists, experimenters, and instrument builders, instrumentation, and hardware/software system design.2 accommodating the resulting scientific approaches to description, collaboration, and publishing has required the creation of information technologies that have had and continue to have world-shaking effects. ronald j. murray (rmur@loc.gov) is a digital conversion specialist in the preservation reformatting division, and barbara b. tillett (btil@loc.gov) is the chief of the policy and standards division at the library of congress. cataloging theory in search of graph theory and other ivory towers | murray and tillett 171 descriptions—accounts or representations of a person, object, or event being drawn on by a person, group, institution, and so on, in pursuit of its interests. given this definition, a person (or a computation) operating from a business rules–generated institutional or personal point of view, and executing specified procedures (or algorithms) to do so, is an integral component of a resource description process (see figure 1). this process involves identifying a resource’s textual, graphical, acoustic, or other features and then classifying, making quality and fitness for purpose judgments, etc., on the resource. knowing which institutional or individual points of view are being employed is essential when parties possessing multiple views on those resources describe cultural heritage resources. how multiple resource descriptions derived from multiple points of view are to be related to one another becomes a key theoretical issue with significant practical consequences. ■■ niels bohr’s complementarity principle and the library in 1927, the physicist niels bohr offered a radical explanation for seemingly contradictory observations of physical phenomena confounding physicists at that time.6 according to bohr, creating descriptions of nature is the primary task of the physicist: it is wrong to think that the task of physics is to find out how nature is. physics concerns what we can say about nature.7 descriptions that appear contradictory or incomparable may in fact be signaling deep limitations in language. bohr’s complementarity principle states that a complete description of atomic-level phenomena requires descriptions of both wave and particle properties. this is generally understood to mean that in the normal language these physics research facilities and their supporting academic institutions are the same ones whose scientific subcultures (theory, experiment, and instrument building) generated the data creation, management, analysis, and publication requirements that resulted in the creation of the web. in response to this development, we have come to believe that cultural heritage resource description (i.e., the process of identifying and describing phenomena in the bibliographic universe as opposed to the physical one) must now be as open to the concepts and practices of those twenty-first-century physics subcultures as it had been to the natural sciences during the nineteenth century.3 we have consequently undertaken an intensive study of the scientific subcultures that generate scientific data and have identified four principles on which to base a more general approach to cultural heritage resource description: 1. observations 2. complementarity 3. graphs 4. exemplars the cultural heritage resource description theory to follow proposes a more articulated view of the complex, collaborative process of making available—through their descriptions—socially relevant cultural heritage resources at a global scale. we will demonstrate that a broader understanding of this resource description process (along with the ability to create improved implementations of it) requires integrating ideas from other fields of study, reaching beyond it system design to embrace larger issues. ■■ cataloging as observation as stated in the oxford english dictionary, an observation is: the action or an act of observing scientifically; esp. the careful watching and noting of an object or phenomenon in regard to its cause or effect, or of objects or phenomena in regard to their mutual relations (contrasted with experiment). also: a measurement or other piece of information so obtained; an experimental result.4 following the scientific community’s lead in striving to describe the physical universe through observations, we adapted the concept of an observation into the bibliographic universe and assert that cataloging is a process of making observations on resources. human or computational observers following institutional business rules (i.e., the terms, facts, definitions, and action assertions that represent constraints on an enterprise and on the things of interest to the enterprise)5 create resource figure 1. a resource description modeled as a business ruleconstrained account of a person, object, or event 172 information technology and libraries | december 2011 purpose, its reformatting, and its long-term preservation must take into consideration that resource’s physical characteristics. having things to say about cultural heritage resources—and having many “voices” with which to say them—presents the problem of creating a well-articulated context for library-generated resource descriptions as well as those from other sources. these contextualization issues must be addressed theoretically before implementation-level thinking, and the demands of contextualization require visualization tools to complement the narratives common to catalogers, scholars, and other users. this is where mathematics and ethnomathematics make their entrance. ethnomathematics is the study of the mathematical practices of specific cultural groups over the course of their daily lives and as they deal with familiar and novel problems.10 an ethnomathematical perspective on cultural heritage resource description directs one’s attention to the existence of simple and complex resource descriptions, the patterns of descriptions that have been created, and the representation of these patterns when they are interpreted as expressions of mathematical ideas. a key advantage of operating from an ethnomathematical perspective is becoming aware that mathematical ideas can be observed within a culture (namely the people and institutions who play key roles in observing the bibliographic universe) before their having been identified and treated formally by western-style mathematicians. ■■ resource description as graph creation relationships between cultural heritage resource descriptions can be represented as conceptually engaging and flexible systems of connections mathematicians call graphs. a full appreciation of two key mathematical ideas underlying the evolution of cataloging—putting things into groups and defining relationships between things and groups of things—was only possible after the founding, naming, and expansion of graph theory, which is a field of mathematics that emerged in the 1850s, and the eventual acceptance around 1900 of set theory, a field founded amid intense controversy in 1874. between the emergence of formal mathematical treatments of those ideas by mathematicians and their actual exploitation by cataloging theorists—or by anyone capable of considering library resource description and organization problems from a mathematical perspective—lay a gulf of more than one hundred years.11 it remained for scholars in the library world to begin addressing the issue. tillett’s 1987 work on bibliographic relationships and svenonius’s 2000 definition of bibliographic entities in set-theoretic terms that physicists use to communicate experimental results, the wholeness of nature is accessible only through the embrace of complementary, contradictory, and paradoxical descriptions of it. later in his career, bohr vigorously affirmed his belief that the complementarity principle was not limited to quantum physics: in general philosophical perspective, it is significant that, as regards analysis and synthesis in other fields of knowledge, we are confronted with situations reminding us of the situation in quantum physics. thus, the integrity of living organisms, and the characteristics of conscious individuals, and most of human cultures, present features of wholeness, the account of which implies a typically complementary mode of description. . . . we are not dealing with more or less vague analogies, but with clear examples of logical relations which, in different contexts, are met with in wider fields.8 within a library, there are many things catalogers, conservators, and preservation scientists—each with their distinctive skills, points of view, and business rules—can observe and say about cultural heritage resources.9 much of what these specialists say and do strongly affects library users’ ability to discover, access, and use library resources in their original or surrogate forms. while observations made by these specialists from different perspectives may lead to descriptions that must be accepted as valid for those specialists, a fuller appreciation of these descriptions calls for the integration of those multiple perspectives into a well-articulated, accessible whole. reflecting the perspectives of the library of congress directorates in which we work, the acquisitions and bibliographic access (aba) directorate and the preservation directorate, we assert that the most fundamental complementary views on cultural heritage resources involve describing a library’s resources in terms of their availability (from an acquisitions perspective), in terms of their information content (from a cataloging perspective), and in terms of their physical properties (from a preservation perspective). for example, in the normal languages used to communicate their results, preservation directorate conservators narrate their condition assessments and record simple physical measurements of library-managed objects—while at the same time preservation scientists in another section bring instrumentation to acquire optical and chemical data from submitted materials and from reference collections of physical and digital media. even though these assessments and measurements may not be comprehended by or made accessible to most library users, the information gathered possess a critical logical relationship to bibliographic and other descriptions of those same resources. key decisions regarding a library resource’s fitness for cataloging theory in search of graph theory and other ivory towers | murray and tillett 173 by the modeling technique. what is required instead is theory-based guidance of systems development, alongside theory testing and improvement through application use. if software development is not constrained by a tacit or explicit resource description theory or practice, graph or other data structures familiar to the historically less well-informed, those favored by an institution’s system designers and developers, or those familiar to and favored by implementation-oriented communities may be invoked inappropriately.18 given graph theory’s potentially overwhelming mathematical power—as evidenced by its many applications in the physical sciences, engineering, and computer science—investigations into graph theory and its history require close attention both to the history and evolving needs of the cultural heritage community.19 the unnecessary constraint on resource description theory formation occasioned by the use of e-r or oo modeling can be removed by dispensing with it system analysis tools and expressing resource description concepts in graph-theoretical terms. with this step, the very general elements (i.e., entities and relationships) that characterize e-r models and the more implementation-oriented ones in oo models are replaced by more mathematically flexible, theory-relevant elements expressed in graph-theoretical terms. the result is a “graph-friendly” theory of cultural heritage resource description, which can borrow from other fields (e.g., ethnomathematics, history of science) to improve its descriptive and predictive power, guide it system design and use, and, in response to users’ experiences with functioning systems, results in improved theories and information systems. graph theory in a cultural heritage context ever since the nineteenth century foundation of graph theory (though scholars regularly date its origins from euler’s 1736 paper)20 and its move from the backwaters of recreational mathematics to full field status by 1936, graph theory has concerned itself with the properties of systems of connections—nowadays regularly expressed as the mathematical objects called sets.21 in addition to its set notational form, graphs also are depicted and manipulated in diagrammatic form as dots/labeled nodes linked by labeled or unlabeled, simple or arrowed lines. for example, the graph x, consisting of one set of nodes labeled a, b, c, d, e, and f and one set of edges labeled ab, bd, de, ef, and fc, can be depicted in set notation as x = {{a b c d e f}, {ab bd de ef fc}} and can be depicted diagrammatically as in figure 2. when graphs are defined to represent different types of nodes and relationships, it becomes possible to create and discuss structures that can support cultural heritage resource description theory and application building. the following diagrams depict simple resource description identified those mathematical ideas in cataloging theory and developed them formally.12 then in 2009, we were able to employ graph theory (expressed in set-theoretical terms and in its highly informative graphical representation) as part of a broader historical and cultural analysis.13 cataloging theory had by 2009 haltingly embraced a new view on how resources in libraries have been described and arranged via their descriptions—an activity that in principle stretches back to catalogs created for the library of alexandria14—and how these structured resource descriptions have evolved over time, irrespective of implementation. murray’s investigation into this issue revealed that the increasingly formalized and refined rules that guided anglo-american catalogers had, by 1876, specified sophisticated systems of cross-references (i.e., connections between bibliographic descriptions of works, authors, and subjects)—systems whose properties were not yet the subject of formal mathematical treatment by mathematicians of the time.15 murray also found that library resource description structures—when teased out of their book and card and digital catalog implementations and treated as graphs—are arguably more sophisticated than those being explored in the world wide web consortium’s (w3c) library linked data initiative.16 implementation-oriented substitutes for graph theory cataloging theory has been both helped and hindered by the use of information technology (it) techniques like entity-relationship modeling (e-r, first used extensively by tillett in 1987 to identify bibliographic relationships in cataloging records) and object-oriented (oo) modeling.17 e-r and oo modeling may be used effectively to create information systems that are based on an inventory of “things of interest” and the relationships that exist between them. unfortunately, the things of interest in cultural heritage institutions keep changing and may require redefinition, aggregation, disaggregation, and re-aggregation. e-r and oo modeling as usually practiced are not designed to manage the degree and kind of changes that take place under those circumstances. when trying to figure out what is “out there” in the bibliographic universe, we assert that focus should first be placed on identifying and describing the things of interest, what relationships exist between them, and what processes are involved in the creation, etc., of resource descriptions. having accomplished this, attention can then be safely paid to defining and managing information deemed essential to the enterprise, that is, undertaking it system analysis and design. but when an it-centric modeling technique becomes the bed on which the resource description theory itself is constructed, the resulting theory will be driven in a direction that is strongly influenced 174 information technology and libraries | december 2011 of the resources they describe. figure 4’s diagrammatic simplicity becomes problematic when large quantities of resources are to be described, when the number and kinds of relationships recorded grows large, and when more comprehensive but less-detailed views of bibliographic relationships are desired. to address these problems in a comprehensive fashion, we examined similar complex description scenarios in the sciences and borrowed another idea from the physics community—paper tool creation and use. ■■ paper tools: graph-aware diagram creation paper tools are collections of symbolic elements (diagrams, characters, etc.), whose construction and manipulation are subject to specified rules and constraints.23 berzelian chemical notation (e.g., c6h12o6) and—more prominently—feynman diagrams like those in figure 5 are familiar examples of paper tool creation and use.24 creating a paper tool resource diagram requires that the rules for creating resource descriptions be reflected in diagram elements, properties of diagram elements, and drawing rules that define how diagram/symbolic elements are connected to one another (e.g., the formula c6h12o6 specifies six molecules of carbon, twelve of hydrogen, and six of oxygen). the detailed bibliographic information in figure 4 is progressively schematized in a graphs that are based on real-world bibliographic descriptions. nodes in the graphs represent text, numbers, or dates and relationships that can be nondirectional (as a simple line), unidirectional (as single arrowed lines) or bidirectional (as a double arrowed line). the all-in-one resource description graph in figure 3 can be divided and connected according to the kinds of relationships that have been defined for cultural heritage resources. this is the point where institutional, group, and individual ways of describing resources shape the initial structure of the graph. once constructed, graph structures like this and their diagrammatic representations are then interpreted in terms of a tacit or explicit resource description theory. in the case of graphs constructed according to ifla’s functional requirements for bibliographic records (frbr) standard,22 figure 3 can be subdivided into four frbr sub-graphs, yielding figure 4. the four diagrams depict the initial graph of cataloging data as four complementary frbr wemi (w–work, e–expression, m–manifestation, and i–item) graphs. note that the item graph contains the call numbers (used here to identify the location of the copy) of three physical copies of the novel. this use of call numbers is qualitatively different from the values found in the manifestation graph in that resource descriptions in this graph apply to the entire population of physical copies printed by the publisher. the descriptions contained in figure 4’s frbr subgraphs reproduce bibliographic characteristics found useful by catalogers, scholars, other educationally oriented end users, and to varying extents the public in general. once created, resource description graphs and subgraphs (in mathematical notation or in simple diagrams like figure 4) can proliferate and link in multiple and complex ways—in parallel with or independently figure 3. library of congress catalog data for thomas pynchon’s novel gravity’s rainbow, represented as an all-inone graph labeled c figure 2. a diagrammatic representation of graph x cataloging theory in search of graph theory and other ivory towers | murray and tillett 175 6 graph is now represented explicitly by a black dot in a ring in the more schematic paper tool version. resource descriptions are then represented in fixed colors and positions relative to the resource/ring: the worklevel resource description is represented by a blue box, expression by a green box, manifestation by a yellow box, and item by a red box. depicting one aspect of the frbr way that reflects frbr definitions of bibliographic things of interest and their relevant relationships. as a first step, the four wemi descriptions in figure 4 are given a common identity by linking them to a c node, as in figure 6. the diagram is then further schematized such that frbr description types and relationships are represented by appropriate graphical elements connected to other elements. the result shows how a frbr paper tool makes it much easier to construct and examine complex large-scale properties of resource and resource description structures (like figure 7, right side) without being distracted by textual and linkage details. the resource described (but not shown) by the figure figure 4. the all-in-one graph in figure 3, separated into four frbr work (top-left), expression (top-right), manifestation (bottom-left), and item (bottom-right) graphs figure 5. feynman diagrams of elementary particle interactions figure 6. a frbr resource description graph 176 information technology and libraries | december 2011 expressions. the work products of scholars—especially those creations that are dense with quotations, citations, and other types of direct and derived textual and graphical reference within and beyond themselves—are excellent environments for paper tool explorations and more generally, for testing of exemplars—solutions to the potentially complex problem of describing cultural heritage resources. ■■ exemplars the fourth principle in our cultural heritage resource description theory involves exemplar identification and analysis. according to the historian of science thomas s. kühn, exemplars are sets of concrete problems and solutions encountered during one’s education, training, and work. in the sciences, exemplar-based problem finding and solving involves mastery of relevant models, builds knowledge bases, and hones problem-solving skills. every student in a field would be expected to demonstrate mastery by learning and using their field’s exemplars. change within a scientific field is manifest by the need to modify old or create new exemplars as new problems appear and must be solved.26 a cultural heritage resource description theorist would, in addition to identifying and developing exemplars from real bibliographic data and other sources, want to speculate about possible resource/description configurations that call for changes in existing information technologies. to the theorist, it would be as important to find out what can’t be done with frbr and other resource description models at library, archive, museum, and internet scales, as it is to be able to explain routine item cataloging and tagging activities. discovering system limitations is better done in advance by simulating uncommon or challenging circumstances than by having problems appear later in production systems. model graphically, the descriptions closest to the black dot resource/slot are the most concrete and those furthest away the most abstract. (readers wishing to interpret frbr paper tool diagrams without reference to color values should note the strict ordering of wemi elements: w–e–m–i–resource/ring or resource/ring–i–m–e–w.) finally, to minimize element use when pairs of wemi boxes touch, the appropriate frbr linking relationship for the relevant pair of descriptions (as explicitly shown in the expanded graph) is implied but not shown. with appropriate diagramming conventions, the process of creating and exploring resource description complexes addresses combined issues of cataloging theory and institutional policy—and results in an ability to make better-informed judgments/computations about resource descriptions and their referenced resources. as a result, resource description graphs are readily created and transformed to serve theoretical—and with greater experience in thinking and programming along graph-friendly lines, practical—ends. one example of transformability would arise when exploring the implications of removing redundant portions of related resource descriptions as more copies of the same work are brought to the bibliographic universe. the frbr paper tool elements and the more articulated resource description graphs in figure 8 both depict the consequences of a practical act: combining resource descriptions for two copies of the same edition of the novel gravity’s rainbow.25 the top-most frbr diagram and its magnified section depict how the graph would look with a single item-level description, the call number for one physical copy. the bottom-most frbr diagram and its magnified section depict the graph with two item-level descriptions, the call numbers for two physical copies. a frbr paper tool’s flexibility is useful for exploring potentially complex bibliographic relationships created or uncovered by scholars—parties whose expertise lies in identifying, interrelating, and discussing creative concepts and influences across a full range of communicative figure 7. a frbr paper tool diagram element (left) and the less schematic frbr resource description graph it depicts (right) cataloging theory in search of graph theory and other ivory towers | murray and tillett 177 drawing diagrams. use case diagrams are secondary in use case work.28 as products of and guides for theory making, resource description exemplars have different origins and audiences than those for use cases. while use cases and exemplars offer perspectives that can support information system design, exemplars were originally introduced as theoretical entities by kühn to explain how theories and theory-committed communities can crystallize around problem-solution sets, how these sets also can serve as pedagogical tools, and why and when problem-solution sets get displaced by new ones. the proposed process of cultural heritage exemplar creation and use, followed by modification or replacement in the face of changes in the bibliographic universe draws on kühn’s and historian of science david kaiser’s interest in how work gets done in the sciences, in addition to their rejection of paradigms as eerie self-directing processes.29 exemplars are not use cases use cases are a software modeling technique employed by the w3c library linked data incubator group (lld xg) in support of requirements specification.27 kühnstyle exemplars are definitely not to be confused with use cases, which are requirements-gathering documents that contribute to software engineering projects. there is a wikipedia definition of a use case that describes its properties: a use case in software engineering and systems engineering, is a description of steps or actions between a user (or “actor”) and a software system which leads the user towards something useful. the user or actor might be a person or something more abstract, such as an external software system or manual process. . . . use cases are mostly text documents, and use case modeling is primarily an act of writing text and not figure 8. frbr paper tool diagram elements and the frbr resource description graphs they depict 178 information technology and libraries | december 2011 ■■ a webpage and its underlying, globally distributed, multimedia resource network, as it changes over time. such exemplars can be presented diagrammatically through the use of paper tools. this use of diagrams in support of conceptualization and information system design is deliberately patterned after professional data modeling theory and practice.31 paper tool–supported analyses of a nineteenth-century american novel (exemplar 1) and of eighteenth-century french poems drawn from state archives (exemplar 2) will be presented to illustrate how information system design and pedagogy can be informed by exemplary scholarly research and publication, combined with narrativized diagrammatic representations of bibliographic and other relationships in traditional and digital media. exemplar 1. from moby-dick to mash-ups—a print publication history and multimedia mash-up problem document the publication history of print copies of a literary work, identifying editorially driven content transfer across print editions along with content selection and transformation in support of multimedia resource creation. solution the solution to this descriptive problem relies heavily on placing resource descriptions into groups and then defining relationships within and across those groups— i.e., on graph creation. after locating a checklist that documented the publication history of the novel and after identifying key components of a moby-dick and orson welles–themed multimedia resource appropriation and transformation network, murray used the frbr paper tool along with additional connection rules to create a resource description diagram (rdd) that represented g. thomas tanselle’s documentation of the printing history (from 1851 to 1976) of herman melville’s epic novel, moby-dick.32 the resulting diagram provides a high-level view of a large set of printed materials—depicting concepts such as a creative work, the expression of the work in a particular mode of languaging (i.e., speech, sign, image), and more concrete concepts such as publications. to reduce displayed complexity, sets of frbr diagram elements were collapsed into green shaded squares representing entire editions/printings, yielding figure 9.33 the vertical axis represents the year of publication, starting with the 1851 printings at the top. connected squares the resulting network of connections in figure 9 can be interpreted in publishing terms. one line or two or more lines descending downwards from a printing’s green in addition, resource description structures specified in an exemplar can and should represent a more abstract treatment of a resource description and not just data or data structures engaged by end users. exemplars on hand and others to come cultural heritage resource description exemplars have been created over time as solutions to problems of resource description and later made available for use, study, mastery, and improvement. while not necessarily bound to a particular information technology, such as papyrus, parchment, index cards, database records, or rdf aggregations, resource description exemplars have historically provided descriptive solutions of physical resources whose physical and intellectual structure had originally been innovative solutions to describing, for example, ■■ a manuscript (individual and related multiples, published but host to history, imaginary, etc.); ■■ a monograph in one edition (individual and related multiples); ■■ a monograph in multiple editions (individual and related multiples); and ■■ a publication in multiple media, created sequentially or simultaneously. with the advent of electronic and then digital communications media, more complex resource description problem-solution sets have been called for as a response to enduringly or recently more sophisticated creative/ editorial decision-making and to more flexible print and digital information technology production capabilities. the most challenging problem-solution sets involve the assembly and cross-referencing of several multipart—and possibly multimedia—creative or editorially constructed works, such as the following: ■■ a work published as a monograph, but which has been reprinted and reedited; translated into numerous languages; supplemented by illustrations from multiple artists; excerpted and adapted as plays, an opera, comic books, and cartoon series; multimedia mash-ups; and has been directly quoted in paintings and other graphic arts productions, and has been the subject of dissertations, monographs, journal articles, etc. ■■ a continuing publication (individual and related multiple publications, special editions, name, publisher, editorial policy changes, etc.). ■■ a monograph whose main content is composed nearly entirely of excerpts from other print publications.30 ■■ a library-hosted multimedia resource and its associated resource description network. cataloging theory in search of graph theory and other ivory towers | murray and tillett 179 by paper tool diagram creation, analysis, and subsequent action, namely, ■■ connecting the squares (i.e., assigning at least one relationship to a printing) ensures access based on the relationship assigned; and ■■ parties located around the globe can examine a given connected or disconnected resource description network and develop strategies for enhancing its usefulness. the wealth of descriptive information available in the moby-dick exemplar illustrates how previous and future collaborative efforts between cultural heritage institutions and other parties have already generated resource descriptions that possess a network structure alongside its content. with a more graph-friendly and collaborative implementation, melville scholars, scholarly organizations,34 and enthusiasts could more effectively examine, discuss, and through their actions enhance the moby dick resource description network’s documentary, scholarly, and educational value. in its original form, the moby dick resource description diagram (and the exemplar it partially documents) only depicted full-length publications of melville’s work. as a test of the frbr paper tool’s ability to accommodate both traditional and modern creative expressions in individual and aggregate form—while continuing to serve theoretical, practical, and educational ends—murray added a resource description network for orson whales,35 square are interpreted to mean that the printing gave rise to one or more additional printings, which may occur in the same or later years. two or more lines converging on a green square from above indicate that the printing was created by combining texts from multiple prior printings—an editorial/creative technique similar to that used to construct the mash-ups published on the web. connecting unconnected squares tanselle’s checklist did not specify predecessor or successor relationships for each post–1851 printing. this often unavoidable, incomplete status is depicted in figure 9 as green squares that are ■■ not linked to any squares above it, i.e., to earlier printings; and/or ■■ not linked to any squares below it, i.e., to later printings; or ■■ connected islands, without a link to the larger structure. recognizing the extent of moby-dick printing disconnectedness in tanselle’s checklist and developing a strategy for dealing with it only by analyzing tanselle’s checklist would be extremely difficult. in contrast, the disconnectedness of the moby-dick resource description network, and its implications for search-based discovery based on following the depicted relationships is readily discernable in figure 9. the ease with which the disconnected condition can be assessed also hints at benefits to be gained by collaborative resource description supported figure 9. a moby-dick resource description diagram, depicting relationships between printings made between 1851–1976 (greatly reduced scale) 180 information technology and libraries | december 2011 darnton’s book can stand on its own as an exemplar for historical method, with the diagram providing additional diagrammatic support. solution 2 darnton’s analysis treated each poem found in the archives as an individual creative work,38 enabling the use of the frbr paper tool (as a bookkeeping device this time) instead of a tool designed to aggregate and describe archival materials. the resulting diagram is a more articulated frbr paper tool depiction of darnton’s poetry communication network, a section of which appears as figure 11. the depiction of the poetry communication network shown in figure 11 is composed of: ■■ tan squares that depict individuals (clerks, professors, priests, students, etc.) who read, discussed, copied, and passed along the poems. ■■ diagram elements that depict poetry written on scraps of paper (treated as resources) that were police custody, were admitted to having existed by suspects, or assumed to have existed by the police. if one’s theory and business rules permit it, paper tool drawing conventions can depict descriptions of lost and nonexistent but nonetheless describable resources. ■■ arrowed lines that represent relationships between a poem and the individuals who owned copies, those who created or received copies of the poem, etc.39 with darnton’s monograph to provide background information regarding the historical personages involved, relationships between the works and the people, document selection from archival fonds, and the point of view of the scholar, the resulting problem-solution set can: ■■ serve as enhanced documentation for darnton-style communication network analysis and discussion. ■■ serve as an exemplar for catalogers, scholars, and alex itin’s moby-dick-themed multimedia mash-up, to the print media diagram. the four-minute long orson whales multimedia mashup contains hundreds of hand-painted page images from the novel, excerpts from the led zeppelin song “moby dick,” parts of two vocal performances by the actor orson welles, and a video clip from welles’s motion picture citizen kane. the result is shown in figure 10.36 the leftmost group of descriptions in figure 10 depicts various releases of led zeppelin’s “moby dick.” the central group depicts the sources of two orson welles audio dialogues after they had been ripped (i.e., digitized from physical media) and made available online. the grouping on the right depicts the orson whales mash-up itself and collections of digital images of painted pages created from two printed copies of the novel. exemplar 2. poetry and the police—archival content identification and critical analysis problem examine archival collections and select, describe, and document ownership and other relationships of a set of documents (poems) alleged to have circulated within a loosely defined social group. solution 1 in his 2010 work, poetry and the police: communication networks in eighteenth-century paris, historian robert darnton studied a 1749 paris police investigation into the transmission of poems highly critical of the french king, louis xv. after combing state archives for police reports, finding and identifying scraps of paper once held as evidence, and collecting other archival materials, darnton was able to construct a poetry communication network diagram,37 which, along with his narrative account, identified a number of parties who owned, copied, and transmitted six of the scandalous poems and placed their activities in a political, social, and literary context. figure 10. a resource description diagram of alex itin’s moby-dick multimedia work, depicting the resources and their frbr descriptions. cataloging theory in search of graph theory and other ivory towers | murray and tillett 181 with all of the adaptations and excerpts extant within a specified bibliographic universe (such as the cataloging records that appear in oclc’s worldcat bibliographic database). resource description diagrams, created from real-world or theoretically motivated considerations, would then provide a diagrammatic means for depicting the precise and flexible underlying mathematical ideas that, heretofore unrecognized but nonetheless systematically employed, serve resource description ends. if the structure of a well-motivated and constructed resource description diagram subsequently makes data representation and management requirements that a given information system cannot accommodate, cataloging theorists and information technologists alike will then know of that system’s limitations, will work together on mitigating them, and will embark on improving system capabilities. ■■ cataloging theory, tool-making, education, and practice this modernized resource description theory offers new and enhanced roles and benefits for cultural heritage personnel as well as for the scholars, students, and those members of the general public who require support not just for searching, but also for collecting, reading, writing, collaborating, monitoring, etc.40 information systems that others who seek similar solutions to their problems with identifying, describing, depicting, and discussing as individual works documents ordinarily bundled within hierarchically structured archival fonds at multiple locations. ■■ a paper tool into a power tool there are limits to what can be done with a hand-drawn frbr paper tool. while murray was able to depict largescale bibliographic relationships that probably had not been observed before, he was forced to stop work on the moby-dick diagram because much of the useful information available could not fit into a static, hand-drawn diagram. we think that automated assistance in creating resource description diagrams from bibliographic records is required. with that capability available, cataloging theorists and parties with scholarly and pedagogical interests could interactively and efficiently explore how scholars and sophisticated readers describe significant quantities of analog and digital resources. it would then be possible and extremely useful to be able to initiate a scholarly discussion or begin a lecture by saying, “given a moby-dick resource description network . . . ” and then proceed to argue or teach from a diagram depicting all known printings of moby-dick—along figure 11. a section of darnton’s poetry communication network 182 information technology and libraries | december 2011 the value of non-euclidean geometry lies in its ability to liberate us from preconceived ideas in preparation for the time when exploration of physical laws might demand some geometry other than the euclidean.41 taking riemann to heart, we assert that the value of describing cultural heritage resources as observations organized into graphs and of enhancing and supplementing the resource description exemplars that have evolved over time and circumstance rests in opportunities for liberating the cultural heritage community from preconceived ideas about resource description structures and from longstanding points of view on those resources. having achieved such a goal, the cultural heritage community would then be ready when the demand came for resource description structures that must be more flexible and powerful than the traditional ones. given the unprecedented development of the web and the promise of bottom-up semantic web initiatives, we think that the time for the cultural heritage community’s liberation is at hand. ■■ acknowledgments the authors wish to thank beacher wiggins and dianne van der reyden, directors of the library of congress acquisitions and bibliographic access directorate and the preservation directorates, respectively, for supporting the authors’ efforts to explore and renew the scientific and mathematical foundations of cultural heritage resource description. thanks also to marcia ascher, david hay, robert darnton, daniel huson, and mark ragan, whose scholarship informed our own; and to joanne o’brienlevin for her critical eye and for editorial advice. references and notes 1. oed online, “catalogue, n.” http://www.oed.com/view dictionaryentry/entry/28711 (accessed aug. 10, 2011). 2. peter galison, “part ii: building data,” in image & logic: a material culture of microphysics (chicago: univ. of chicago pr., 2003): 370–431. 3. gordon mcquat, “cataloguing power: delineating ‘competent naturalists’ and the meaning of species in the british museum,” british journal for the history of science 34, no. 1 (mar. 2001): 1–28. exclusive control of classification schemes and of the records that named and described its specimens are said to have contributed to the success of the british museum’s institutional mission in the nineteenth century. as a division of the british museum, the british library appears to have incorporated classification concepts (hierarchical structuring) from its parent and elaborated on the museum’s strategies for cataloging species. 4. oed online, “observation, n.” http://www.oed.com/ viewdictionaryentry/entry/129883 (accessed july 8, 2011). couple modern, high-level understandings about how cultural heritage resources can be described, organized, and explored with data models that support linking within and across multiple points of view will be able to support those requirements. the complementarity of cosmological and quantum-level views cataloging theory formation and practice—two areas of activity that did not interest many outside of cultural heritage institutions—can now be understood as a much more comprehensive multilayered activity that is approachable from at least two distinct points of view. the approach presented in this paper represents a cosmological-level view on the bibliographic universe. this treatment of existing or imaginable large-scale configurations of cultural heritage resource descriptions serves as a complement to the quantum-level view of resource description, as characterized by it-related specificities such as character sets, identifiers, rdf triples, triplestores, etc. activities at the quantum level—the domain of semantic web technologists and others—yield powerful and relatively unconstrained information management systems. in the absence of cosmological-level inspiration or guidance, these systems have not necessarily been tested against nontrivial, challenging cultural heritage resource description scenarios like those documented in the above two exemplars. applying both views to the bibliographic universe would clearly be beneficial for all institutional and individual parties involved. if ever a model for multilevel, multidisciplinary effort was required, the history of physics is illuminated by mutually influential interactions of cosmological and quantum-level theories, practices, and pedagogy. workers in cultural heritage institutions and technologists pursuing w3c initiatives would do well to reflect on the result. ■■ ready for the future—and creating the future to explore the cultural, scientific, and mathematical ideas underlying cultural heritage resource description, to identify, study, and teach with exemplars, and to exploit the theoretical reach and bookkeeping capability of paper tool –like techniques is to pay homage to the cultural heritage community’s 170+ year-old talent for pragmatic, implementation-oriented thinking,while at the same time pointing out a rich set of possibilities for enhanced service to society. the cultural heritage community can draw inspiration from geometrician bernhard riemann’s own justification for his version of thinking outside of the box called euclidean geometry: cataloging theory in search of graph theory and other ivory towers | murray and tillett 183 18. the prospects for creating graph-theoretical functions that operate on resource description networks are extremely promising. for example, combinatorica (an implementation of graph theory concepts created for the computer mathematics application mathematica) is composed of more than 450 functions. were cultural heritage resource description networks to be defined using this application’s graph-friendly data format, significant quantities of combinatorica functions would be available for theoretical and applied uses; siriam pemmaraju and steven skiena, computational discrete mathematics: combinatorics and graph theory with mathematica (new york: cambridge univ. pr., 2003). 19. dénes könig, theory of finite and infinite graphs, trans. richard mccoart (boston: birkhaüser, 1990); fred buckley and marty lewinter, a friendly introduction to graph theory (upper saddle river, n.j.: pearson, 2003); oystein ore and robin wilson, graphs and their uses (washington d.c.: mathematical association of america, 1990). 20. leonhard euler, “solutio problematis ad geometriam situs pertinentis,” commentarii academiae scientarium imperalis petropolitanae no. 8 (1736): 128–40. 21. “set theory, branch of mathematics that deals with the properties of well-defined collections of objects, which may or may not be of a mathematical nature, such as numbers or functions. the theory is less valuable in direct application to ordinary experience than as a basis for precise and adaptable terminology for the definition of complex and sophisticated mathematical concepts.” quoted from encyclopædia britannica online, “set theory,” oct. 2010, http://www.britannica.com/ebchecked/ topic/536159/set-theory (accessed oct. 27, 2010). 22. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records: final report (munich: k.g. saur, 1998). this document is downloadable as a pdf from http://www.ifla.org/vii/s13/ frbr/frbr.pdf or as an html page at http://www.ifla.org/vii/ s13/frbr/frbr.htm. 23. ursula klein, ed., experiments, models, paper tools: cultures of organic chemistry in the nineteenth century (stanford, calif.: stanford univ. pr., 2003); klein, ed., tools and modes of representation in the laboratory sciences (boston: kluwer, 2001); david kaiser, drawing theories apart: the dispersion of feynman diagrams in postwar physics (chicago: univ. of chicago pr., 2005). 24. for more examples and a general description of feynman diagrams, see http://www2.slac.stanford.edu/vvc/theory/ feynman.html. 25. an enlarged version of this diagram may be found online. ronald j. murray and barbara b. tillett, “frbr paper tool diagram elements and the frbr resource description graphs they depict,” aug. 2011, http://arizona.openrepository.com/ arizona/bitstream/10150/139769/2/fig%208%20frbr%20 paper%20tool%20elements%20and%20graphs.pdf. other informative illustrations also are available. murray and tillett, “resource description diagram supplement to ‘cataloging theory in search of graph theory and other ivory towers. object: cultural heritage resource description networks,” aug. 2011, http://hdl.handle.net/10150/139769. 26. thomas s. kühn, the structure of scientific revolutions, 2nd ed. (chicago: univ. of chicago pr., 1970). 27. daniel vila suero, “use case report,” world wide web consortium, june 27, 2011, http://www.w3.org/2005/ incubator/lld/wiki/usecasereport. 5. david c. hay, uml and data modeling: a vade mecum for modern times (bradley beach, n.j.: technics pr., forthcoming 2011): 124–25. some scholars argue that decisions as to what the things of interest are and the categories they belong to are influenced by social and political factors. geoffrey c. bowker, susan leigh star, sorting things out: classification and its consequences (cambridge, mass.: mit pr., 1999). 6. gerald holton, “the roots of complementarity,” daedalus 117, no. 3 (1988): 151–97, http://www.jstor.org/stable/20023980 (accessed feb. 24, 2011). 7. niels bohr, quoted in aage petersen, “the philosophy of niels bohr,” bulletin of the atomic scientists 19, no. 7 (sept. 1963): 12. 8. niels bohr, “quantum physics and philosophy: causality and complementarity,” in essays 1958–1962 on atomic physics and human knowledge (woodbridge, conn.: ox bow, 1997): 7. 9. for cataloging theorists, the description of cultural heritage things of interest yields groups of statements that occupy different levels of abstraction. upon regarding a certain physical object, a marketer describes product features, a linguist enumerates utterances, a scholar perceives a work with known or inferred relationships to other works, and so on. 10. marcia ascher, ethnomathematics: a multicultural view of mathematical ideas (pacific grove, calif.: brooks/cole, 1991); ascher, mathematics elsewhere: an exploration of ideas across cultures (princeton: princeton univ. pr., 2002). 11. a timeline of events, people, and so on that have had or should have had an impact on describing cultural heritage resources is available online. seven fields or subfields are represented in the timeline and keyed by color: library & information science; mathematics; ethnomathematics; physical sciences; biological sciences; computer science; and arts & literature. ronald j. murray, “the library organization problem,” dipity .com, aug. 2011, http://www.dipity.com/rmur/libraryorganization-problem/ or http://www.dipity.com/rmur/ library-organization-problem/?mode=fs (fullscreen view). 12. barbara ann barnett tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (phd diss., university of california, los angeles, 1987); elaine svenonius, the intellectual foundation of information organization (cambridge, mass.: mit pr., 2000): 32–51. svenonius’s definition is opposed to database implementations that permitted boolean operations on records at retrieval time. 13. ronald j. murray, “the graph-theoretical library,” slideshare.net, july 5 2011, http://www.slideshare.net/ ronmurray/-the-graph-theoretical-library. 14. francis j. witty, “the pinakes of callimachus,” library quarterly 28, no. 1–4 (1958): 132–36. 15. ronald j. murray, “re-imagining the bibliographic universe: frbr, physics, and the world wide web,” slideshare .net, oct. 22 2010, http://www.slideshare.net/ronmurray/frbrphysics-and-the-world-wide-web-revised. 16. for an overview of the technology-driven library linked data initiative, see http://linkeddata.org/faq. murray’s analyses of cultural heritage resource descriptions may be explored in a series of slideshows at http://www.slideshare.net/ronmurray/. 17. pat riva, martin doerr, and maja žumer, “frbroo: enabling a common view of information from memory institutions,” international cataloging & bibliographic control 38, no. 2 (june 2009): 30–34. 184 information technology and libraries | december 2011 36. the multimedia mash-up in figure 10 was linked to the much larger moby-dick structure depicted in figure 9. the combination of the two yields figure 10a, which is too detailed for printout but which can be downloaded for inspection as the following pdf file: ronald j. murray and barbara b. tillett, “transfer and transformation of content across cultural heritage resources: a moby-dick resource description network covering full-length printings from 1851–1976*,” july 2011, http://arizona.openrepository.com/arizona/bitstream/10150/136270/4/fig%2010a%20orson%20whales%20 in%20moby%20dick%20context.pdf. in the figure, two print publications have been expanded to reveal their own similar mash-up structure. 37. robert darnton, poetry and the police: communication networks in eighteenth-century paris (cambridge, mass.: belknap pr. of harvard univ. pr., 2010): 16. 38. ronald j. murray in a discussion with robert darnton, sept. 20, 2010. darnton considered the poems retrieved from the archives as distinct intellectual creations, which permitted the use of frbr diagram elements for the analysis. otherwise, a paper tool with diagram elements based on the archival descriptive standard isad(g) would have been used. committee on descriptive standards, isad (g): general international standard archival description (stockholm, sweden, 1999– ). 39. the complete poetry communication diagram may be viewed at http://arizona.openrepository.com/arizona/ bitstream/10150/136270/6/fig%2011%20poetry%20commun ication%20network.pdf. 40. carole l. palmer, lauren c. teffeau, and carrie m. pittman, scholarly information practices for the online environment: themes from the literature and implications for library science development (dublin, ohio: oclc research, 2009), http://www . o c l c . o rg / p ro g r a m s / p u b l i c a t i o n s / re p o r t s / 2 0 0 9 0 2 . p d f (accessed july 15, 2011). 41. g. f. b. riemann, quoted in marvin j. greenberg, euclidean and non-euclidean geometry: development and history (new york: freeman, 2008): 371. 28. wikipedia.org, “use case,” june 13, 2011, http://en .wikipedia.org/wiki/use_case. 29. kaiser, drawing theories, 385–86. 30. prime examples being jacques derrida’s typographically complex 1974 work glas (univ. of nebraska pr.), and reality hunger: a manifesto (vintage), david shield’s 2011 textual mashup on the topic of originality, authenticity, and mash-ups in general. 31. graeme simsion, data modeling: theory and practice (bradley beach, n.j.: technics, 2007): 333. 32. herman melville, moby-dick (new york: harper & brothers; london: richard bentley, 1851). moby-dick edition publication history excerpted from g. thomas tanselle, checklist of editions of moby-dick 1851–1976. issued on the occasion of an exhibition at the newberry library commemorating the 125th anniversary of its original publication (evanston, ill.: northwestern univ. pr.; chicago: newberry library, 1976). 33. ronald j. murray, “from moby-dick to mash-ups: thinking about bibliographic networks,” slideshare.net, apr. 2011, http://www.slideshare.net/ronmurray/from-mobydick-to-mashups-revised. the moby-dick resource description diagram was presented to the american library association committee on cataloging: description and access at the ala annual conference, washington d.c., july 2010. 34. the life and works of herman melville, melville.org, july 25, 2000, http://melville.org. 35. the new york artist alex itin describes his creation: “it is more or less a birthday gift to myself. i’ve been drawing it on every page of moby dick (using two books to get both sides of each page) for months. the soundtrack is built from searching ‘moby dick’ on youtube (i was looking for orson’s preacher from the the [sic] john huston film) . . . you find tons of led zep [sic] and drummers doing bonzo and a little orson . . . makes for a nice melville in the end. cinqo [sic] de mayo i turn forty. ahhhhhhh the french champagne.” quoted from alex itin, “orson whales,” youtube, jan. 2011, http://www.youtube .com/watch?v=2_3-gem6o_g. 104 information technology and libraries | september 2010 development of such a mediation mechanism calls for an empirical assessment of various issues surrounding metadata-creation practices. the critical issues concerning metadata practices across distributed digital collections have been relatively unexplored. while examining learning objects and e-prints communities of practice, barton, currier, and hey point out the lack of formal investigation of the metadatacreation process.2 as will be discussed in the following section, some researchers have begun to assess the current state of descriptive practices, metadata schemata, and content standards. however, the literature has not yet developed to a point where it affords a comprehensive picture. given the propagation of metadata projects, it is important to continue to track changes in metadata-creation practices while they are still in constant flux. such efforts are essential for adding new perspectives to digital library research and practices in an environment where metadata best practices are being actively sought after to aid in the creation and management of high-quality digital collections. this study examines the prevailing current state of metadata-creation practices in digital repositories, collections, and libraries, which may include both digitized and born-digital resources. using nationwide survey data, mostly drawn from the community of cataloging and metadata professionals, we seek to investigate issues in creating descriptive metadata elements, using controlled vocabularies for subject access, and propagating metadata and metadata guidelines beyond local environments. we will address the following research questions: 1. which metadata schema(ta) and content standard(s) are employed in individual digital repositories and collections? 2. which controlled vocabulary schema(ta) are used to facilitate subject access? 3. what criteria are applied in selecting metadata and controlled-vocabulary schema(ta)? 4. to what extent are mechanisms for exposing and sharing metadata integrated into current metadatacreation practices? in this article, we first review recent studies relating to current metadata-creation practices across digital collections. then we present the survey method employed to conduct this study, the general characteristics of survey participants, and the validity of the collected data, followed by the study results. we report on how metadata and controlled vocabulary schema(ta) are being used across institutions, and we present a data analysis of current metadata-creation practices. the final section summarizes the study and presents some suggestions for future studies. this study explores the current state of metadata-creation practices across digital repositories and collections by using data collected from a nationwide survey of mostly cataloging and metadata professionals. results show that marc, aacr2, and lcsh are the most widely used metadata schema, content standard, and subjectcontrolled vocabulary, respectively. dublin core (dc) is the second most widely used metadata schema, followed by ead, mods, vra, and tei. qualified dc’s wider use vis-à-vis unqualified dc (40.6 percent versus 25.4 percent) is noteworthy. the leading criteria in selecting metadata and controlled-vocabulary schemata are collection-specific considerations, such as the types of resources, nature of the collection, and needs of primary users and communities. existing technological infrastructure and staff expertise also are significant factors contributing to the current use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. metadata interoperability remains a major challenge. there is a lack of exposure of locally created metadata and metadata guidelines beyond the local environments. homegrown locally added metadata elements may also hinder metadata interoperability across digital repositories and collections when there is a lack of sharable mechanisms for locally defined extensions and variants. m etadata is an essential building block in facilitating effective resource discovery, access, and sharing across ever-growing distributed digital collections. quality metadata is becoming critical in a networked world in which metadata interoperability is among the top challenges faced by digital libraries. however, there is no common data model that cataloging and metadata professionals can readily reference as a mediation mechanism during the processes of descriptive metadata creation and controlled vocabulary schemata application for subject description.1 the jung-ran park (jung-ran.park@ischool.drexel.edu) is assistant professor, college of information science and technology, drexel university, philadelphia, and yuji tosaka (tosaka@tcnj.edu) is cataloging/metadata librarian, tcnj library, the college of new jersey, ewing, new jersey. jung-ran park and yuji tosaka metadata creation practices in digital repositories and collections: schemata, selection criteria, and interoperability metadata creation practices in digital repositories and collections | park and tosaka 105 possible increase in the use of locally developed schemata as many projects added new types of nontextual digital objects that could not be adequately described by existing metadata schemata.6 there is a lack of research concerning the current use of content standards; however, it is reasonable to suspect that content-standards use exhibits patterns similar to that of metadata because of their often close association with particular metadata schemata. the oclc rlg survey reveals that anglo-american cataloguing rules, 2nd edition (aacr2)—the traditional cataloging rule that has most often been used in conjunction with marc—is the most widely used content standard (81 percent). aacr2 is followed by describing archives: a content standard (dacs) with 42 percent; descriptive cataloging of rare materials with 33 percent; archives, personal papers, manuscripts (appm) with 25 percent; and cataloging cultural objects (cco) with 21 percent.7 in the same way as metadata schemata, there appears to be a concentration of a few controlled vocabulary schemata at research institutions. ma’s arl survey, for example, shows that the library of congress subject headings (lcsh) and name authority file (naf) were used by most survey respondents (96 percent and 88 percent, respectively). these two predominantly adopted vocabularies are followed by several domain-specific vocabularies, such as art and architecture thesaurus (aat), library of congress thesaurus for graphical materials (tgm) i and ii, getty thesaurus of geographic names (tgn), and the getty union list of artists names (ulan), which were used by between 30 percent to more than 60 percent of respondents.8 the oclc rlg survey reports similar results; however, nearly half of the oclc rlg survey respondents (n = 9) indicated that they had also built and maintained one or more locally developed thesauri.9 while creating and sharing information about local metadata implementations is an important step toward increased interoperability, recent studies tend to paint a grim picture of current local documentation practices and open accessibility. in a nationwide study of institutional repositories in u.s. academic libraries, markey et al. found that only 61.3 percent of the 446 survey participants with operational institutional repositories had implemented policies for metadata schemata and authorized metadata creators.10 the oclc rlg survey also highlights limited collaboration and sharing of the metadata guidelines both within and across the institutions. it finds that even when there are multiple units creating metadata within the same institution, metadata-creation guidelines often are unlikely to be shared (28 percent do not share; 53 percent sometimes share).11 a mixed result is reported on the exposure of metadata to outside service providers. in an arl survey, the university of houston libraries institutional repository ■■ literature review as evinced by the principles and practices of bibliographic control through shared cataloging, successful resource access and sharing in the networked environment demands semantic interoperability based on accurate, complete, and consistent resource description. the recent survey by ma finds that the open archives initiative protocol for metadata harvesting (oai-pmh) and metadata crosswalks have been adopted by 83 percent and 73 percent of respondents, respectively. even though the sample comes only from sixty-eight association of research libraries (arl) member libraries, and the figures thus may be skewed higher than those of the entire population of academic libraries, there is little doubt that interoperability is a critical issue given the rapid proliferation of metadata schemata throughout digital libraries.3 while there is a variety of metadata schemata currently in use for organizing digital collections, only a few of them are widely used in digital repositories. in her arl survey, ma reports that the marc format is the most widely used metadata schema (91 percent), followed by encoded archival description (ead) (84 percent), unqualified dublin core (dc) (78 percent), and qualified dc (67 percent).4 similarly, a 2007 member survey by oclc research libraries group (rlg) programs gathered information from eighteen major research libraries and cultural heritage institutions and also found that marc is the most widely used scheme (65 percent), followed by ead (43 percent), unqualified dc (30 percent), and qualified dc (29 percent). the different levels of use reported by these studies are probably due to different sample sizes and compositions, but results nonetheless suggest that metadata use at research institutions tends to rely on a small number of major schemata.5 there may in fact be much greater diversity in metadata use patterns when the scope is expanded to include both research and nonresearch institutions. palmer, zavalina, and mustafoff, for example, tracked trends from 2003 through 2006 in metadata selection and application practices at more than 160 digital collections developed through institute of museum and library services grants. they found that despite perceived limitations, use of dc is the most widespread, with more than half of the digital collections using it alone or in combination with other schemata. marc ranks second, with nearly 30 percent using it alone or in combination. the authors found that the choice of metadata schema is largely influenced by practices at peer institutions and compatibility with a content management system. what is most striking, however, is the finding that locally developed schemata are used as often as marc. there is a decline in the percentage of digital projects using multiple metadata schemata (from 53 percent to 38 percent). yet the authors also saw a 106 information technology and libraries | september 2010 ■■ method the objective of the research reported in this paper is to examine the current state of metadata-creation practices in terms of the creation of descriptive metadata elements, the use of controlled vocabularies for subject access, and the exposure of metadata and metadata guidelines beyond local environments. we conducted a web survey using websurveyor (now vovici: http://www.vovici .com). the survey included both structured and openended questions. it was extensively reviewed by members of an advisory board—a group of three experts in the field—and it was pilot-tested prior to being officially launched. the survey included many multiple-response questions that called for respondents to check all applicable answers. we recruited participants through survey invitation messages and subsequent reminders to the electronic mailing lists of communities of metadata and cataloging professionals. table 1 shows the mailing lists employed for the study. we also sent out individual invitations and distributed flyers to selected metadata and cataloging sessions during the 2008 ala midwinter meeting, held that year in philadelphia. the survey attracted a large number of initial participants (n = 1,371), but during the sixty-two days from august 6 to october 6, 2008, we only received 303 completed responses via the survey management system. we suspect that the high incompletion rate (77.9 percent) stems from the fact that the subject matter may have been outside the scope of many participants’ job responsibilities. the length of the survey may also have been a factor in the incompletion rate. the profiles of respondents’ job titles (see table 2) task force found that exposing metadata to oai-pmh service providers is an established practice used by nearly 90 percent of the respondents.12 ma’s arl survey also reports the wide adoption of oai-pmh (83 percent). these results underscore the virtual consensus on the critical importance of exposing metadata to achieve interoperability and make locally created metadata useful across distributed digital repositories and collections.13 by contrast, the oclc rlg survey shows that only one-tenth of the respondents stated that all non-marc metadata is exposed to oai harvesters, while 30 percent indicated that only some of it was available. the prominent theme revealed by the oclc rlg survey is an “inward focus” in current metadata practices, marked by the “use of local tools to reach a generally local audience.”14 in summary, recent studies show that the current practice of metadata creation is problematic due to the lack of a mechanism for integrating various types of metadata schemata, content standards, and controlled vocabularies in ways that promote an optimal level of interoperability across digital collections and repositories. the problems are exacerbated in an environment where many institutions lack local documentation delineating the metadata-creation process. at the same time, researchers have only recently begun studying these issues, and the body of literature is at an incipient stage. the research that was done often targeted different populations, and sample sizes were different (some very small). in some cases the literature exhibits contradictory findings about issues surrounding metadata practices, increasing the difficulty in understanding the current state of metadata creation. this points out the need for further research of current metadata-creation practice. table 1. electronic mailing lists for the survey electronic mailing lists e-mail address autocat dublin core listserv metadata librarians listserv library and information technology association listserv online audiovisual catalogers electronic discussion list subject authority cooperative program listserv serialist text encoding initiative listserv electronic resources in libraries listserv encoded archival description listserv autocat@listserv.syr.edu dc-libraries@jiscmail.ac.uk metadatalibrarians@lists.monarchos.com lita-l@ala.org olac-list@listserv.acsu.buffalo.edu sacolist@listserv.loc.gov serialst@list.uvm.edu tei-l@listserv.brown.edu eril-l@listserv.binghamton.edu ead@listserv.loc.gov metadata creation practices in digital repositories and collections | park and tosaka 107 and job responsibilities (see table 3) clearly show that most of the individuals who completed the survey engage professionally in activities directly relevant to the research objectives, such as descriptive and subject cataloging, metadata creation and management, authority control, nonprint and special material cataloging, electronic resource and digital project management, and integrated library system (ils) management. although the largest number of participants (135, or 44.6 percent) chose the “other” category regarding their job title (see table 2), it is reasonable to assume that the vast majority can be categorized as cataloging and metadata professionals.15 most job titles given as “other” are associated with one of the professional activities listed in table 4. thus it is reasonable to assume that the respondents are in an appropriate position to provide first-hand, accurate information about the current state of metadata creation in their institutions. concerning the institutional background of participants, of the 303 survey participants, fewer than half (121, or 39.9 percent) provided institutional information. we believe that this is mostly due to the fact that the question was optional, following a suggestion from the institutional review board at drexel university. of those that provided their institutional background, the majority (75.2 percent) are from academic libraries, followed by participants from public libraries (17.4 percent) and from other institutions (7.4 percent). table 3. participants’ job responsibilities (multiple responses) job responsibilities number of participants general cataloging (e.g., descriptive and subject cataloging) 171 (56.4%) metadata creation and management 153 (50.5%) authority control 147 (48.5%) nonprint cataloging (e.g., microform, music scores, photographs, video recordings) 133 (43.9%) special material cataloging (e.g., rare books, foreign language materials, government documents) 126 (41.6%) digital project management 101 (33.3%) electronic resource management 62 (20.5%) ils management 59 (19.5%) other 51 (16.8%) survey question: what are your primary job responsibilities? (please check all that apply) table 2. job titles of participants (multiple responses) job titles number of participants other 135 (44.6%) cataloger/cataloging librarian/ catalog librarian 99 (32.7%) metadata librarian 29 (9.6%) catalog & metadata librarian 26 (8.6%) head, cataloging 26 (8.6%) electronic resources cataloger 17 (5.6%) cataloging coordinator 15 (5.0%) head, cataloging & metadata services 15 (5.0%) n = 227. survey question: what is your working job title? (please check all that apply) table 4. professional activities specified in “other” category in table 2 professional activities number of participants cataloging & metadata creation 31 (10.2%) digital projects management 23 (7.6%) technical services 17 (5.6%) archiving 16 (5.3%) electronic resources and serials management 6 (2.0%) library system administration/ other 6 (2.0%) n = 99. survey question: if you selected other, please specify. 108 information technology and libraries | september 2010 it is noteworthy that use of qualified dc was higher than that of unqualified dc. this result is different from the arl survey and a member survey conducted ■■ results in this section, we will present the findings of this study in the following three areas: (1) metadata and controlled vocabulary schemata and metadata tools used, (2) criteria for selecting metadata and controlled vocabulary schemata, and (3) exposing metadata and metadata guidelines beyond local environments. metadata and controlled vocabulary schemata and metadata tools used a great variety of digital objects were handled by the survey participants, as figure 1 shows. the most frequently handled object was text, cited by 86.5 percent of the respondents. about three-fourths of the respondents described audiovisual materials (75.2 percent), while 60.1 percent described images and 51.8 percent described archival materials. more than 65 percent of the respondents handled electronic resources (68.3 percent) and digitized resources (66.7 percent), while approximately half handled borndigital resources (52.5 percent). the types of materials described in digital collections were diverse, encompassing both digitized and born-digital materials; however, digitization accounted for a slightly greater percentage of metadata creation. to handle these diverse digital objects, the respondents’ institutions employed a wide range of metadata schemata, as figure 2 shows. yet there were a few schemata that were widely used by cataloging and metadata professionals. specifically, 84.2 percent of the respondents’ institutions used marc; dc was also popular, with 25.4 percent using unqualified dc and 40.6 percent using qualified dc to create metadata. ead also was frequently cited (31.7 percent). in addition to these major types of metadata schemata, the respondents’ institutions also employed metadata object description schema (mods) (17.8 percent), visual resource association (vra) core (14.9 percent), and text encoding initiative (tei) (12.5 percent). figure 1. materials/resources handled (multiple responses) survey question: what type of materials/resources do you and your fellow catalogers/metadata librarians handle? (please check all that apply) figure 2. metadata schemata used (multiple responses) survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 109 custom metadata elements derives from the imperative to accommodate the perceived needs of local collections and users, as indicated by the two most common responses: (1) “to reflect the nature of local collections/resources” (76.9 percent) and (2) “to reflect the characteristics of target audience/community of local collections” (58.3 percent). local conditions were also cited from institutional and technical standpoints. many institutions (34.3 percent) follow existing local practices for cataloging and metadata creation while other institutions (18.5 percent) are making homegrown metadata additions because of constraints imposed by their local systems. table 6 summarizes the most frequently used controlled vocabulary schematas by resource type. by far the most widely used schema across all resource types was lcsh. the preeminence of lcsh evinces the critical role that it plays as the de facto form of controlled vocabulary for subject description. library of congress classification (lcc) was the second choice for all resource types other than images, cultural objects, and archives. for digital collections of these resource types and digitized resources, aat was the second most used controlled vocabulary, a fact that reflects its purpose as a domain-specific terminology used for describing works of art, architecture, visual resources, material culture, and archival materials. while traditional metadata schemata, content standards, and controlled vocabularies such as marc, aacr2, and lcsh clearly were preeminent in the majority of the respondents’ institutions, current metadata creation in digital repositories and collections faces new challenges from the enormous volume of online and digital resources.19 approximately one-third of the respondents’ institutions (33.8 percent) were meeting this challenge with tools for semiautomatic metadata generation. yet a majority of respondents (52.5 percent) indicated that their institutions did not use any such tools for metadata creation and management. this result seems to contrast with ma’s finding that automatic metadata generation was used in some capacity in nearly by oclc rlg programs (as described in “literature review” on page 105).16 in these surveys, unqualified dc was more frequently cited than qualified dc. one possible explanation of this less frequent use of unqualified dc may lie in the limitations of unqualified dc metadata semantics. survey respondents also reported on problems using dc metadata, which were mostly caused by semantic ambiguities and semantic overlaps of certain dc metadata elements.17 limitations and issues of unqualified dc metadata semantics are discussed in depth in park’s study.18 in light of these results, examining trends of qualified dc use in a future study would be interesting. despite the wide variety of schemata reported in use, there seemed to be an inclination to use only one or two metadata schemata for resource description. as shown in table 5, the majority of the respondents’ institutions (53.6 percent) used only one schema for metadata creation, while approximately 37 percent used two or three schemata (26.2 percent and 10.3 percent, respectively). the institutions using more than three schemata during the metadata-creation processes comprised only 9.9 percent of the respondents. turning to content standards (see figure 3), we found that aacr2 was the most widely used standard, indicated by 84.5 percent of respondents. this high percentage clearly reflects the continuing preeminence of marc as the metadata schema of choice for digital collections. dc application profiles also showed a large user base, indicated by more than one-third of respondents (37.0 percent). more than one quarter of the respondents (28.4 percent) used ead application guidelines as developed by the society of american archivists and the library of congress, while 10.6 percent used rlg best practice guidelines for encoded archival description (2002). about one quarter (25.7 percent) indicated dacs as their content standard. homegrown standards and guidelines are local application profiles that clarify existing content standards and specify how values for metadata elements are selected and represented to meet the requirements of a particular context. as shown in the results on metadata schemata, it is noteworthy that homegrown content standards and guidelines constituted one of the major choices of participants, indicated by more than one-fifth of the institutions (22.1 percent). almost two-fifths of the survey participants (38 percent) also reported that they add homegrown metadata elements to a given metadata schema. slightly less than half of the participants (47.2 percent) indicated otherwise. the local practice of creating homegrown content guidelines and metadata elements during the metadatacreation process deserves a separate study; this study only briefly touches on the basis for locally added custom metadata elements. the motivation to create table 5. number of metadata schemata in use number of metadata schemata in use number of participants 1 141 (53.6%) 2 69 (26.2%) 3 27 (10.3%) 4 or more 26 (9.9%) n=263. survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use the most? (please check all that apply) 110 information technology and libraries | september 2010 criteria for selecting metadata and controlled vocabulary schemata what are the factors that have shaped the current state of metadata-creation practices reported thus far? in this section, we turn our attention to constraints that affect decision making at institutions in the selection of metadata and controlled vocabulary schemata for subject description. figure 4 presents the percentage of different metadata schemata selection criteria described by survey participants. first, collection-specific considerations clearly played a major role in the selection. the most frequently cited reason was “types of resources” (60.4 percent). this response reflects the fact that a large number of metadata schemata have been developed, often with wide variation in content and format, to better handle particular two-thirds of arl libraries.20 because semiautomatic metadata application is reported in-depth in a separate study, we only briefly sketch the topic here.21 the semiautomatic metadata application tools used in the respondents’ digital repositories and collections can be classified into five categories of common characteristics: (1) metadata format conversion, (2) templates and editors for metadata creation, (3) automatic metadata creation, (4) library system for bibliographic and authority control, and (5) metadata harvesting and importing tools. as table 7 illustrates, among those institutions that have introduced semiautomatic metadata generation tools, “metadata format conversion” (38.6 percent) and “templates and editors for metadata creation” (27 percent) are the two most frequently cited tools. figure 3. content standards used (multiple responses) survey question: what content standard(s) and/or guidelines do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 111 job responsibility, “expertise of staff” (44.2 percent) and “integrated library system” (39.9 percent) appeared to highlight the key role that marc continues to play in the metadata-creation process for digital collections (see figure 2). “budget” also appeared to be an important factor in metadata selection (17.2 percent), showing that funding levels played a considerable role in metadata decisions. types of information resources. the primary factor in selecting metadata schemata is their suitability for describing the most common type of resources handled by the survey participants. the second and third most common criteria, “target users/ audience” (49.8 percent) and “subject matters of resources” (46.9 percent), also seem to reflect how domain-specific metadata schemata are applied. in making decisions on metadata schemata, respondents weighed materials in particular subject areas (e.g., art, education, and geography) and the needs of particular communities of practice as their primary users and audiences. however, existing technological infrastructure and resource constraints also determine options. given the prominence of general library cataloging as a primary table 6. the most frequently used controlled vocabulary schema(s) by resource type (multiple responses) lcsh lcc ddc aat tgm ulan tgn other text 79.5% (241) 35.6% (108) 16.8% (51) 10.2% (31) 6.9% (21) 3.6% (11) 5.0% (15) 14.2% (43) audiovisual materials 67.3% (204) 25.1% (76) 12.9% (39) 9.2% (28) 8.6% (26) 4.0% (12) 5.0% (15) 14.5% (44) cartographic materials 44.9% (136) 17.5% (53) 7.3% (22) 5.0% (15) 4.3% (13) 1.3% (4) 4.3% (13) 6.3% (19) images 43.2% (131) 11.9% (36) 5.6% (17) 25.7% (78) 20.1% (61) 9.9% (30) 10.6% (32) 11.2% (34) cultural objects (e.g., museum objects) 20.1% (61) 7.3% (22) 4.3% (13) 13.2% (40) 6.3% (19) 4.6% (14) 3.0% (9) 7.9% (24) archives 44.2% (134) 11.6% (35) 6.3% (19) 11.9% (36) 6.6% (20) 3.0% (9) 2.6% (8) 12.2% (37) electronic resources 60.7% (184) 23.4% (71) 8.6% (26) 5.3% (16) 3.6% (11) 1.7% (5) 3.0% (9) 14.2% (43) digitized resources 51.8% (157) 15.5% (47) 5.0% (15) 15.5% (47) 10.2% (31) 6.6% (20) 7.6% (23) 15.2% (46) born-digital resources 43.9% (133) 13.5% (41) 5.6% (17) 8.3% (25) 7.3% (22) 4.3% (13) 4.6% (14) 13.9% (42) survey question: which controlled vocabulary schema(s) do you and your fellow catalogers/metadata librarians use most? (please check all that apply) table 7. types of semi-automatic metadata generation tools in use types response rating metadata format conversion 38 (38.6%) templates and editors for metadata creation 26 (27.0%) automatic metadata creation 16 (16.7%) library system for bibliographic and authority control 15 (15.6%) metadata harvesting and importing tools 8 (8.3%) n = 96. survey question: please describe the (semi)automatic metadata generation tools you use. 112 information technology and libraries | september 2010 the software used by their institutions—i.e., “integrated library system” (39.9 percent), “digital collection or asset management software” (25.4 percent), “institutional repository software” (19.8 percent), “union catalogs” (14.9 percent), and “archival management software” (5.6 percent)—as a reason for their selection of metadata schemata. metadata decisions thus seem to be driven by a variety of local technology choices for developing digital repositories and collections. as shown in figure 5, similar patterns are observed with regard to selection criteria for controlled vocabulary schemata. three of the four selection criteria receiving majority responses—“target users/audience” (55.4 percent), “type of resources” (54.8 percent), and “nature of the collection” (50.2 percent)—suggest that controlled vocabulary decisions are influenced primarily by the substantive purpose and scope of controlled vocabularies for local collections. a major consideration seems to be whether particular controlled vocabularies are suitable for representing standard data values to improve access and retrieval for target audiences. “metadata standards,” another selection criteria frequently cited in the survey (54.1 percent), reflects how some domain-specific metadata schemata tend to dictate the use of particular controlled vocabularies. at the same time, the results also suggest that resources and technological infrastructure available to institutions were also important reasons for their selections. “expertise of staff” (38.3 percent) seems to be a straightforward practical reason: the application of controlled vocabularies is highly dependent on the width and depth of staff expertise available. likewise, when implementing controlled vocabularies in the digital environment, some institutions also took into account at the same time, it is noteworthy that while responses were not mutually exclusive, many respondents cited figure 4. criteria for selecting metadata schemata (multiple responses) question: which criteria were applied in selecting metadata schemata? (please check all that apply) figure 5. criteria for selecting controlled vocabulary schemata (multiple responses) question: which criteria are applied in selecting controlled vocabulary schemata? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 113 for search engines and 63.2 percent for oai harvesters), a result that may be interpreted as a tendency to create metadata primarily for local audiences. why do many institutions fail to make their locally created metadata available to other institutions despite wide consensus on the importance of metadata sharing in a networked world? responses from those institutions exposing none or not all of their metadata (see table 8) reveal that financial, personnel, and technical issues are major hindrances in promoting the exposure of metadata outside the immediate local environment. some institutions are not confident that their current metadata practices are able to satisfy the technical requirements for producing standards-based interoperable metadata. another reason frequently mentioned is copyright concerns about limited-access materials. yet some respondents simply do not see any merit to exposing their item-level metadata, citing its relative uselessness for resource discovery outside their institutions. as stated earlier, the practice of adding homegrown metadata elements seems common among many institutions. while locally created metadata elements accommodate local needs and requirements, they may also hinder metadata interoperability across digital repositories and collections if mechanisms for finding information about such locally defined extensions and variants are absent. homegrown metadata guidelines document local data models and function as an essential mechanism for metadata creation and quality assurance within and across digital repositories and collections.23 in this regard, it is essential to examine locally created metadata guidelines and best practices.24 however, the results of the survey analysis evince that the vast majority of institutions (72.0 percent) provided no public access to local application profiles on their websites while only 19.6 percent of respondents’ institutions made them available online to the public. ■■ conclusion metadata plays an essential role in managing, organizing, and searching for information resources. in the networked existing system features for authority control and controlled vocabulary searching, as exhibited by 17.2 percent of responses for “digital collection/or asset management software.” exposing metadata and metadata guidelines beyond local environments metadata interoperability across distributed digital repositories and collections is fast becoming a major issue.22 the proliferation of open-source and commercial digital library platforms using a variety of metadata schemata has implications on the librarians’ ability to create shareable and interoperable metadata beyond the local environment. to what extent are mechanisms for sharing metadata integrated into the current metadata-creation practices described by the respondents? figure 6 summarizes the responses concerning the uses of three major mechanisms for metadata exposure. approximately half of respondents exposed at least some of their metadata to search engines (52.8 percent) and union catalogs such as oclc worldcat (50.6 percent). more than one-third of the respondents exposed all or some of their metadata through oai harvesters (36.8 percent). about half or more of the respondents either did not expose their metadata or were not sure about the current operations at their institutions (e.g., 47.2 percent figure 6. mechanism to expose metadata (multiple responses) survey question: do you/your organization expose your metadata to oai (open archives initiative) harvesters, union catalogs or search engines? 114 information technology and libraries | september 2010 the dc metadata schema is the second most widely employed according to this study, with qualified dc used by 40.6 percent of responding institutions and unqualified dc used by 25.4 percent. ead is another frequently cited schema (31.7 percent), followed by mods (17.8 percent), vra (14.9 percent), and tei (12.5 percent). a trend of qualified dc being used (40.6 percent) more often than unqualified dc (25.4 percent) is noteworthy. one possible explanation of this trend may be derived from the fact that semantic ambiguities and overlaps in some of the unqualified dc elements interfere with use in resource description.25 given the earlier surveys reporting the higher use of unqualified dc over qualified dc, more in-depth examination of their use trends may be an important avenue for future studies. despite active research and promising results obtained from some experimental tools, practical applications of semiautomatic metadata generation have been incorporated into the metadata-creation processes by only one-third of survey participants. the leading criteria in selecting metadata and controlled vocabulary schemata are derived from collection-specific considerations of the type of resources, the nature of the collections, and the needs of primary users and communities. existing technological infrastructure, encompassing digital collection or asset management software, archival management software, institutional repository software, integrated library systems, and union catalogs also greatly influence the selection process. the skills and knowledge of metadata professionals and the expertise of staff also are significant factors in understanding current practices in the use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. the survey responses reveal that metadata interoperability remains a challenge in the current networked environment despite growing awareness of its importance. for half of the survey respondents, exposing metadata to the service providers, such as oai harvesters, union catalogs, and search engines, does not seem to be a high priority because of local financial, personnel, and technical constraints. locally created metadata elements are added in many digital repositories and collections in large part to meet local descriptive needs and serve the target user community. while locally created metadata elements accommodate local needs, they may also hinder metadata interoperability across digital repositories and collections when shareable mechanisms are not in place for such locally defined extensions and variants. locally created metadata guidelines and application profiles are essential for metadata creation and quality assurance; however, most custom content guidelines and best practices (72 percent) are not made publicly available. the lack of a mechanism to facilitate public access to local application profiles and metadata guidelines may environment, the enormous volume of online and digital resources creates an impending research need to evaluate the issues surrounding the metadata-creation process and the employment of controlled vocabulary schemata across ever-growing distributed digital repositories and collections. in this paper we explored the current status of metadata-creation practices through an examination of survey responses drawn mostly from cataloging and metadata professionals (see tables 2, 3, and 4). the results of the study indicate that current metadata practices still do not create conditions for interoperability. despite the proliferation of newer metadata schemata, the survey responses showed that marc currently remains the most widely used schema for providing resource description and access in digital repositories, collections, and libraries. the continuing predominance of marc goes hand-in-hand with the use of aacr2 as the primary content standard for selecting and representing data values for descriptive metadata elements. lcsh is used as the de facto controlled vocabulary schema for providing subject access in all types of digital repositories and collections, while domain-specific subject terminologies such as aat are applied at significantly higher rates in digital repositories handling nonprint resources such as images, cultural objects, and archival materials. table 8. sample reasons for not exposing metadata not all our metadata conforms to standards required not all our metadata is oai compliant lack of expertise and time and money to develop it it restrictions security concerns on the part of our information technology department some collections/records are limited access and not open to the general public we think that having worldcat available for traditional library materials that many libraries have is a better service to people than having each library dump our catalog out on the web varies by tool and collection, but usually a restriction on the material, a technical barrier, or a feeling that for some collections the data is not yet sufficiently robust “still in a work in progress” survey question: if you selected “some, but not all” or “no” in question 13 [see figure 6], please tell why you do not expose your metadata. metadata creation practices in digital repositories and collections | park and tosaka 115 presented at 2003 dublin core conference: supporting communities of discourse and practice—metadata research & applications, seattle, wash., sept. 28–oct. 2, 2003), http://dcpapers .dublincore.org/ojs/pubs/article/view/732/728 (accessed mar. 24, 2009); sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata-creation process,” alt-j 12 (2004): 5–20. 3. jin ma, metadata, spec kit 298 (washington, d.c.: association of research libraries, 2007): 13, 28. 4. ibid., 12, 21–22. 5. karen smith-yoshimura, rlg programs descriptive metadata practices survey results (dublin, ohio: oclc, 2007): 6–7, http://www.oclc.org/programs/publications/reports/2007-03 .pdf (accessed mar. 24, 2009); karen smith-yoshimura and diane cellentani, rlg programs descriptive metadata practices survey results: data supplement (dublin, ohio: oclc, 2007): 16, http://www.oclc.org/programs/publications/reports/2007-04 .pdf (accessed mar. 24, 2009). 6. carole palmer, oksana zavalina, and megan mustafoff, “trends in metadata practices: a longitudinal study of collection federation” (paper presented at the seventh acm/ iees-cs joint conference on digital libraries, vancouver, british columbia, canada, june 18–23, 2007), http://hdl.handle .net/2142/8984 (accessed mar. 24, 2009). 7. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 17. 8. ma, metadata, 12, 22–23. 9. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 18–21. 10. karen markey et al., census of institutional repositories in the united states: miracle project research findings (washington, d.c.: council on library & information resources, 2007): 3, 46–50, http://www.clir.org/pubs/reports/pub140/pub140.pdf (accessed mar. 24, 2009). 11. yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 24. 12. university of houston libraries institutional repository task force, institutional repositories, spec kit 292 (washington, d.c.: association of research libraries, 2006): 18, 78. 13. ma, metadata, 13, 28. 14. smith-yoshimura, rlg programs descriptive metadata practices survey results, 9, 11; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 27–29. 15. for the metrics of job responsibilities used to analyze job descriptions and competencies of cataloging and metadata professionals, see jung-ran park, caimei lu, and linda marion, “cataloging professionals in the digital environment: a content analysis of job descriptions,” journal of the american society for information science & technology 60 (2009): 844–57; jung-ran park and caimei lu, “metadata professionals: roles and competencies as reflected in job announcements, 2003–2006,” cataloging & classification quarterly 47 (2009): 145–60. 16. ma, metadata; smith-yoshimura, rlg programs descriptive metadata practices survey result. 17. jung-ran park and eric childress, “dublin core metadata semantics: an analysis of the perspectives of information professionals,” joural of information science 35, no. 6 (2009): 727–39. 18. park, “semantic interoperability.” 19. jung-ran park, “metadata quality in digital repositories: hinder cross-checking for quality metadata and creating shareable metadata that can be harvested for a high level of consistency and interoperability across distributed digital collections and repositories. development of a searchable registry for publicly available metadata guidelines has potential to enhance metadata interoperability. a constraining factor of this study derives from the participant population; thus we have not attempted to generalize the findings of the study. however, results indicate a pressing need for a common data model that is shareable and interoperable across ever-growing distributed digital repositories and collections. development of such a common data model demands future research of a practical and interoperable mediation mechanism underlying local implementation of metadata elements, semantics, content standards, and controlled vocabularies in a world where metadata can be distributed and shared widely beyond the immediate local environment and user community. (other issues such as semiautomatic metadata application, dc metadata semantics, custom metadata elements, and the professional development of cataloging and metadata professionals are explained in-depth in separate studies.)26 for future studies, incorporation of other research methods (such as follow-up telephone surveys and face-to-face focus group interviews) could be used to better understand the current status of metadata-creation practices. institutional variation also needs be taken into account in the design of future studies. ■■ acknowledgments this study is supported through an early career development research award from the institute of museum and library services. we would like to express our appreciation to the reviewers for their invaluable comments. references 1. jung-ran park, “semantic interoperability and metadata quality: an analysis of metadata item records of digital image collections,” knowledge organization 33 (2006): 20–34; rachel heery, “metadata futures: steps toward semantic interoperability,” in metadata in practice, ed. diane i. hillman and elaine l. westbrooks, 257–71 (chicago: ala, 2004); jung-ran park, “semantic interoperability across digital image collections: a pilot study on metadata mapping” (paper presented at the canadian association for information science 2005 annual conference, london, ontario, june 2–4, 2005), http://www.cais-acsi .ca/proceedings/2005/park_j_2005.pdf (accessed mar. 24, 2009). 2. jane barton, sarah currier, and jessie m. n. hey, “building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice” (paper 116 information technology and libraries | september 2010 a survey of the current state of the art,” in “metadata and open access repositories,” ed. michael s. babinec and holly mercer, special issue, cataloging & classification quarterly 47, no. 3/4 (2009): 213–38. 20. ma, metadata, 12, 24. the oclc rlg survey found that about 40 percent of the respondents were able to generate some metadata automatically. see smith-yoshimura, rlg programs descriptive metadata practices survey results, 6; yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 35. 21. jung-ran park and caimei lu, “application of semiautomatic metadata generation in libraries: types, tools, and techniques,” library & information science research 31, no. 4 (2009): 225–31. 22. park, “semantic interoperability”; sarah l. shreeves et al., “is ‘quality’ metadata ‘shareable’ metadata? the implications of local metadata practices for federated collections” (paper presented at the 12th national conference of the association of college and research libraries, apr. 7–10, 2005, minneapolis, minnesota), https://www.ideals.uiuc.edu/handle/2142/145 (accessed mar. 24, 2009); amy s. jackson et al., “dublin core metadata harvested through oai-pmh,” journal of library metadata 8, no. 1 (2008): 5–21; lois mai chan and marcia lei zeng, “metadata interoperability and standardization—a study of methodology part i: achieving interoperability at the schema level,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/ dlib/june06/chan/06chan.html (accessed mar. 24, 2009); marcia lei zeng and lois mai chan, “metadata interoperability and standardization—a study of methodology part ii: achieving interoperability at the record and repository levels,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/ zeng/06zeng.html (accessed mar. 24, 2009). 23. thomas r. bruce and diane i. hillmann, “the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice, ed. hillman and westbrooks, 238–56; heery, “metadata futures”; park, “metadata quality in digital repositories.” 24. jung-ran park, ed., “metadata best practices: current issues and future trends,” special issue, journal of library metadata 9, no. 3/4 (2009). 25. see park, “semantic interoperability”; park and childress, “dublin core metadata semantics.” 26. park and childress, “dublin core metadata semantics”; park and lu, “application of semi-automatic metadata generation in libraries.” editorial board thoughts: technology and mission: reflections of a first-year college library director ed tallent information technology and libraries | december 2012 3 as i reflect on my first year as director for a small college library, several themes are clear to me, but perhaps none resonates as vibrantly as the challenges in managing technology, technology planning, and the never-ending need for technology integration, both within the library and the college. it is all-encompassing, involving every library activity and initiative. while my issues will naturally have a contextual flavor unique to my place of employment, i imagine they reflect issues that all librarians face (or have already faced). what is perhaps less unique is how these issues of library technology intersect with some very high priority college initiatives and challenges. and, given myriad reports on students’ ongoing ambivalent attitudes toward libraries (after everything we have done for them!), it still behooves us to keep working at this integration of the library into the learning and teaching process and to hitch our wagon to larger strategic missions. so, what issues have i faced? the campus portal vs. library web site: this issue is neither new nor unique, but is still is a tangled web of conflicting priorities and attitudes, campus politics and technology vision, the extent and location of technology support, and the flexibility of the campus portal or content management system (cms) and the people who direct it. it is not a question of any misunderstandings, as the need to market the library via the campus web site is obvious and the goal of personalized service is laudatory. yet, marrying the external marketing needs with the internal support needs is a difficult balance to achieve. the web offers a more dramatic entrée to the library than a portal/intranet, and portal technology is not perfect, as jacob neilson highlights in a recent post. the goal obviously is further complicated by the fact that the support needed to maintain a quality web presence--one that is well graphically interesting, vibrant and intuitive--is significant when one considers library web sites are rarely used a place to begin research by students and faculty. ed tallent (edtallent@ curry.edu) is director, levin library, curry college, milton, massachusetts. http://www.useit.com/alertbox/intranet-usability.html editorial board thoughts: technology and mission | tallent 4 the portal, on the other hand, promises a personalized approach and easier maintenance, but lacks the level of operability that would be desirable. the web presence can support both user needs and offer visitors a sense of the quality services and collections the library provides. so, at this writing, what we have is a litany of questions not yet resolved. mobile, tablets, and virtual services: the questions also abound in these areas. should we build our own mobile services, or contract out the development? do we (can we) focus on creating a leadership role for the library in the area of emerging technology, or wait for a coordinated institutional vision and plan to emerge? in the area of tablets, we are about to commence circulating ipads and anyone who has gone through the labyrinthian process just to load apps will know that the process gives one pause as to the value of such an initiative, and that is before they circulate and need to be managed. still, it is a technology initiative that demands review of library work flows, security, student training, and collection access. virtual services were at a fairly nascent state upon my arrival and have grown slowly, as they are being developed in a culture that stressed individual, hands-on, and personalized services. virtual services can be all that, but that needs to be demonstrated not only to the user but to the people delivering the service. the added value here is that the work engages us in valuable reflections on the way in which we work or should work. value of the library: i began my new position at time when the college was deeply engrossed in the issue of student recruitment, retention, and success. for my employer these are significant institutional identity issues, and the library is expected to document its contributions to student outcomes and success. not nearly enough has been done, though a working relationship with a new director of institutional research is developing and critical issues such information literacy, integrated student support, learning spaces, learning analytics, and the need for a data warehouse will be incorporated into the into the college’s strategic plan. the opportunity is there for the library to link with major college initiatives, for example, and make information literacy more than a library issue. citation management: now, here is a traditional library activity, the bane of many a reference service interaction and the undergraduate’s last-minute nightmare. a combination of technical, service and fiscal challenge revolve around the campus climate on the use of technology to respond to this quandary. what to do with faculty who believe strongly that the best way to learn this skill is by hand, not with any system that aims for interoperability and a desire to save the time of the user? for others, which tool should be used? should we not just go with a free one? while discipline differences will always exist, the current environment does present opportunities for the library to take a leadership role in defining what the possibilities are and ideally connecting the approach to appropriate and measurable learning outcomes and to the larger issue of academic integrity. information technology and libraries | december 2012 5 e-books, pda, article tokens: one of the unforeseen benefits of my moving to a small college library is that there is not the attachment to a print collection that exists in many/most research libraries. there is remarkable openness to experimenting with and committing to various methods of digital delivery of content. thus, we have been able to test myriad possibilities, from patron driven book purchasing, tokens for journal articles, and streaming popular films from a link in the library management system. this blurring of content, delivery, and functionality presents numerous opportunities for librarians to have conversations with departments of the future of collections. connecting with alumni: this is always an important strategic issue for colleges and universities and it seems as though there are promising emerging options for libraries to deliver database content to alumni, as vendors are beginning to offer more reasonable alumni-oriented packages. my library will be working with the appropriate campus offices next year to develop a plan for funding targeted library content for alumni as part of the college’s broader strategic activities to engage alumni. web design skills: while i understand the value that products like libguides can bring to the community, allowing content experts (librarians) to quickly and easily create template-driven web-based subject guides, i remain troubled by the lack of design skills librarians possess, and by the lack of recognition that good design can be just as important as good content. this is not a criticism, as we are not graphic designers. we have a sense of user needs, knowledge about content, and a desire to deliver, but i believe that products like this lead librarians to believe that good design for learning is easy. i do not claim to be an expert, but i know this is not the case. this approach does not translate into user friendly guides that hold to consistent standards. i think we need to recognize that we can benefit from non-librarian expertise in the area of web design. one opportunity that i want to investigate along these lines is to create student internships that would bring design skills and the student perspective to the work. a win-win, as this also supports the college’s desire for more internships and experiential learning for students. there is neither time nor space to address an even broader library technology issue on the near horizon, which will be another campus engagement moment, the future ils for the library. yet, maybe that should have been addressed first, since what i have read and heard, the new ilss will solve all of the above problems! trends at a glance: a management dashboard of library statistics emily morton-owens and karen l. hanson information technology and libraries | september 2012 36 abstract systems librarians at an academic medical library created a management data dashboard. charts were designed using best practices for data visualization and dashboard layout, and include metrics on gatecount, website visits, instant message reference chats, circulation, and interlibrary loan volume and turnaround time. several charts draw on ezproxy log data that has been analyzed and linked to other databases to reveal use by different academic departments and user roles (such as faculty or student). most charts are bar charts and include a linear regression trend line. the implementation uses perl scripts to retrieve data from eight different sources and add it to a mysql data warehouse, from which php/javascript webpages use google chart tools to create the dashboard charts. introduction new york university health sciences libraries (nyuhsl) had adopted a number of systems that were either open-source, home-grown, or that offered apis of one sort or another. examples include drupal, google analytics, and a home-grown interlibrary loan (ill) system. systems librarians decided to capitalize on the availability of this data by designing a system that would give library management a single, continuously self-updating point of access to monitor a variety of metrics. previously this kind of information had been assembled annually for surveys like aahsl and arl. 1 the layout and scope of the dashboard was influenced by google analytics and a beta dashboard project at brown.2 the dashboard enables closer scrutiny of trends in library use, ideally resulting in a more agile response to problems and opportunities. it allows decisions and trade-offs to be based on concrete data rather than impressions, and it documents the library’s service to its user community, which is important in a challenging budget climate. although the end product builds on a long list of technologies—especially perl, mysql, php, javascript, and google chart tools—the design of the project is lightweight and simple, and the number of lines of code required to power it is remarkably small. further, the design is modular. this means that nyuhsl could offer customized versions for staff in different roles, restricting the display to show only data that is relevant to the individual’s work. because most libraries have a unique combination of technologies in place to handle functions like circulation, reference questions, circulation, and so forth, a one-size-fits-all software package that emily morton-owens (emily.morton.owens@gmail.com) was web services librarian and karen hanson (karen.hanson@med.nyu.edu) is knowledge systems librarian, new york university health sciences libraries, new york. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 37 could be used by any library may not be feasible. instead, this lightweight and modular approach could be re-created relatively easily to fit local circumstances and needs. visual design principles in designing the dashboard, we tried to use some best practices for data visualization and assembling charts into a dashboard. the best-known authority on data visualization, edward tufte, states “above all else, show the data.”3 in part, this means minimizing distractions, such as unnecessary gridlines and playful graphics. ideally, every dot of ink on the page would represent data. he also emphasizes truthful proportions, meaning the chart should be proportional to the actual measurements.4 a chart should display data from zero to the highest quantity, not arbitrarily starting the measurements at a higher number, because that distorts the proportions between the part and the whole. a chart also should not use graphics that differ in width as well as length, because that causes the area of the graphic to increase incorrectly, as opposed to simply the length increasing. pie charts are popular chart types that have serious problems in this respect despite their popularity; they require users to judge the relative area of the slices, which is difficult to do accurately.5 generally, it is better to use a bar chart with different length bars whose proportions users can judge better. color should also be used judiciously. some designers use too many colors for artistic effect, which creates a “visual puzzle”6 as the user wonders whether the colors carry meaning. some colors stand out more than others and should be used with caution. for example, red is often associated with something urgent or negative, so it should only be used in appropriate contexts. duller, less saturated colors are more appropriate for many data visualizations. a contrasting style is exemplified by nigel holmes, who designs charts and infographics with playful visual elements. a recent study compared the participants’ reactions to holmes’ work with plain charts of the same data.7 there was no significant difference in comprehension or shortterm memorability; however, the researchers found that the embellished charts were more memorable over the long term, as well as more enjoyable to look at. that said, holmes’ style is most appropriate for charts that are trying to drive home a certain interpretation. in the case of the dashboard, we did not want to make any specific point, nor did we have any way of knowing in advance what the data would reveal, so we used tufte’s principles in our design. a comparable authority on dashboard design is stephen few. a dashboard combines multiple data displays in a single point of access. as in the most familiar example, a car dashboard, it usually has to do with controlling or monitoring something without taking your focus from the main task.8 a dashboard should be simple and visual, not requiring the user to tune out extraneous information or interpret novel chart concepts. the goal is not to offer a lookup table of precise values. the user should be able to get the idea without reading too much text or having to think information technology and libraries | september 2012 38 too hard about what the graph represents. thinking again of a car, its speedometer does not offer a historical analysis of speed variation because this is too much data to process while the car is moving. similarly, the dashboard should ideally fit on one screen so that it can be taken in at a glance. if this is not possible, at least all of the individual charts should be presented intact, without scrolling or being cramped in ways that distort the data. a dashboard should present data dimensions that are dynamic. the user will refer to the dashboard frequently, so presenting data that does not change over time only takes up space. better yet, the data should be presented alongside a benchmark or goal. a benchmark may be a historical value for the same metric or perhaps a competitor’s value. a goal is an intended future value that may or may not ever have been reached. either way, including this alternate value gives context for whether the current performance is desirable. this is essential for making the dashboard into a decision-making tool. nils rasmussen et al. discuss three levels of dashboards: strategic, tactical (related to progress on a specific project), and operational (related to everyday, department-level processes). 9 so far, nyuhsl’s dashboard is primarily operational, monitoring whether ordinary work is proceeding as planned. later in this paper we will discuss ways to make the dashboard better suited to supporting strategic initiatives. system architecture the dashboard architecture consists of three main parts: importer scripts that get data from diverse sources, a data warehouse, and php/javascript scripts that display the data. the data warehouse is a simple mysql database; the term “warehouse” refers to the fact that it contains a stripped-down, simplified version of the data that is appropriate for analysis rather than operations. our approach to handling the data is an etl (extract, transform, load) routine. data are extracted from different sources, transformed in various ways, and loaded into the data warehouse. our data transformations include reducing granularity and enriching the data using details drawn from other datasets, such as the institutional list of ip ranges and their corresponding departments. data rarely change once in the warehouse because they represent historical measurements, not open transactions.10 there is an importer script customized for each data source. the data sources differ in format and location. for example, google analytics is a remote data source with a unique data export api, the ill data are in a local mysql database, and libraryh3lp has remote csv log files. the scripts run automatically via a cron job at 2a.m. and retrieves data for the previous day. that time was chosen to ensure all other nightly cron jobs that affect the databases are complete before the dashboard imports start. each uses custom code for its data source and creates a series of mysql insert queries to put the needed data fields in the mysql data warehouse. for example, a script might pull the dates when an ill request was placed and filled, but not the title of the requested item. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 39 a carefully thought-out data model simplifies the creation of reports. the data structure should aim to support future expansion. in the data warehouse, information that was previously formatted and stored in very inconsistent ways is brought together uniformly. there is one table for each kind of data with consistent field names for dates, services, and so forth, and others that combine related data in useful ways. the dashboard display consists of a number of widgets, one for each chart. each chart is created with a mixture of php and javascript. google chart tools interprets lines of javascript to draw an attractive, proportional chart. we do not want to hardcode the values in this javascript, of course, because the charts should be dynamic. therefore we use php to query the data warehouse and a statement for each line of results to “write” a line of the data in javascript. figure 1. php is used to read from the database and generate rows of data as server-side javascript. each php/javascript file created through this process is embedded in a master php page. this master page controls the order and layout of the individual widgets using the php include feature to add each chart file to the page plus a css stylesheet to determine the spacing of the charts. finally, because all the queries take a relatively long time to run, the page is cached and refreshes itself the first time the page is opened each day. the dashboard can be refreshed manually if the database or code is modified and someone wants to see the results immediately. many of the dashboard’s charts include a linear regression trend line. this feature is not provided by google charts and must be inserted into the widget’s code manually. the formula can be found online.11 the sums and sums of squares are totted up as the code loops through each line of data, and these totals are used to calculate the slope and intercept. in our twenty-six-week displays, we never want to include the twenty-sixth week of data because that is the present (partial) week. the linear regression line takes the form y = mx + b. we can use that formula along with the slope and intercept values to calculate y-values for week zero and the next-to-last week (week twentyfive). those two points are plotted and the trend line is drawn between them. the color of the line depends on its slope (greater or less than zero). depending on whether we want that chart’s metric to go up or down, the line is green for the desirable direction and red for the undesirable direction. information technology and libraries | september 2012 40 details on individual systems gatecount most of nyuhsl’s five locations have electronic gates to track the number of patrons who visit. formerly these statistics were kept in a microsoft excel spreadsheet, but now there is a simple web form into which staff can enter the gate reading twice daily. the data goes directly into the data warehouse, and the a.m. and p.m. counts are automatically summed. there is some errorchecking to prevent incorrect numbers being entered, which varies depending on whether that location’s gate is the kind that provides a continuously increasing count or is reset each day. the data are presented in a stacked bar chart, summed for the week. the user can hover over the stacked bars to see numbers for each location, but the height of the stacked bar and the trend line represent the total visits for all locations together. figure 2. stacked bar chart with trendline showing visits per week to pphysical library branches over a twenty-six-week period ticketing nyuhsl manages online user requests with a simple ticketing system that integrates with drupal. there are four types of tickets, two of which involve helping users and two of which involve reporting problems. the “helpful” services are general reference questions and literature search requests. the “trouble” services are computer problems and e-resource problems. these two pairs trends at a glance: a management dashboard of library statistics | morton-owens and hanson 41 each have their own stacked bar chart because, ideally, the number of “helpful” tickets would go up while the number of “trouble” tickets would go down. each chart has a trend line, color-coded for the direction that is desirable in that case. figure 3. stacked bar chart with trendline showing trouble tickets by type the script that imports this information into the data warehouse simply does so from another local mysql database. it only fetches the date and the type of request, not the actual question or response. it also inserts a record into the user transactions table, which will be discussed in the section on user data. drupal nyuhsl’s drupal site allows librarians directly to contribute content like subject guides and blog posts.12 the dashboard tracks the number of edits contributed by users (excluding the web services librarian and the web manager, who would otherwise swamp the results). this is done with a simple count query on the node_revisions table in the drupal database. because no other processing is needed and caching ensures the query will be done at most once per day, this is the only widget that pulls data directly from the original database at the time the chart is drawn. koha koha is an open-source opac system. at nyuhsl, koha’s database is in mysql. each night the importer script copies “issues” data from koha’s statistics table. this supports the creation of a information technology and libraries | september 2012 42 stacked bar chart showing the number of item checkouts each week, with each bar divided according to the type of item borrowed (e.g., book or laptop). as with other charts, a color-coded trend line was added to show the change in the number of item checkouts. google analytics the dashboard relies on the google analytics php interface (gapi) to retrieve data using the google analytics data export api.13 nothing is stored in the data warehouse and there is no importer script. the first widget gets and displays weekly total visits for all nyuhsl websites, the main nyuhsl website, and visits from mobile devices. a trend line is calculated from the “all sites” count. the second widget retrieves a list of the top “outbound click” events for the past thirty days and returns them as urls. a regular expression is used to remove any ezproxy prefix, and the remaining url is matched against our electronic resources database to get the title. thus, for example, the widget displays “web of knowledge” instead of “http://ezproxy.med.nyu.edu/login?url=http://apps.isiknowledge.com/.” a future improvement to this display would require a new table in the data warehouse and importer script to store historic outbound click results. this data would support comparison of the current list with past results to identify click destinations that are trending up or down. figure 4. most popular links clicked on to leave the library’s website in a thirty-day period trends at a glance: a management dashboard of library statistics | morton-owens and hanson 43 libraryh3lp libraryh3lp is a jabber-based im product that allows librarians to jointly manage a queue of reference queries. it offers csv-formatted log files that a perl script can access using “curl,” a command-line tool that mimics a web browser’s login, cookies, and file requests. the csv log is downloaded via curl, processed with perl’s text::csv module, and the data are then inserted into the warehouse. the first libraryh3lp widget counts the number of chats handled by each librarian over the past ninety days. the second widget tracks the number of chats for the past twenty-six weeks and includes a trend line. figure 5. bar chart showing number of im chats per week over a twenty-six-week period document delivery services the document delivery services (dds) department fulfills ill requests. the web application that manages these requests is homegrown, with a database in mysql. each night, a script copies the latest requests to the data warehouse. the dashboard uses this data to display a chart of how many requests are made each week and which publications are requested from other libraries most frequently. this data could be used to determine whether there are resources that should be considered for purchase. information technology and libraries | september 2012 44 the dds data was also used to demonstrate how data might be used to track service performance. one chart shows the average time it takes to fulfill a document request. further evaluation is required to determine the usefulness of such a chart for motivating improvement of the service or whether this is perceived as a negative use of the data. some libraries may find this kind of information useful for streamlining services. figure 6. this stacked bar chart shows the number of document delivery requests handled per week. the chart separates patron requests from requests made by other libraries. ezproxy data ezproxy is an oclc tool for authenticating users who attempt to access the library’s electronic resources. it does not log e-resource use where the user is automatically authenticated using the institutional ip range, but the data are still valuable because it logs a significant amount of use that can support in-depth analysis. because of the gaps in the data, much of the analysis looks at patterns and relationships in the data rather than absolute values. karen coombs’ article discussing the analysis of ezproxy logs to understand e-resource at the department level provided the initial motivation to switch on the ezproxy log.14 when logging is enabled, a standard web log file is produced. here is a sample line from the log: 123.45.6.7 amyu0gh5brmuska hansok01 [09/sep/2011:18:25:23 -0500] post http://ovidsp.tx.ovid.com: 80/sp3.3.1a/ovidweb.cgi http/1.1 20020472 http://ovidsp.tx.ovid.com.ezproxy.med.nyu.edu/sp-3.3.1a/ovidweb.cgi trends at a glance: a management dashboard of library statistics | morton-owens and hanson 45 each line in the log contains a user ip address, a unique session id, the user id, the date and time of access, the url requested by the user, the http status code, the number of bytes in the requested file, and the referrer (the page the user clicked on to get to the site). the ezproxy log data undergoes some significant processing before being inserted into the ezproxy report tables. the main goal of this is to enrich the data with relevant supplemental information while eliminating redundancy. to facilitate this process, the importer script first dumps the entire log into a table and then performs multiple updates on the dataset. during the first step of processing, the ip addresses are compared to a list of departmental ip ranges maintained by medical center it. if a match is found, the “location accessed” is stored against the log line. next, the user id is compared with the institutional people database, retrieving a user type (faculty, staff, or student) and a department, if available (e.g., radiology). one item of significant interest to senior management is the level of use within hospitals. as a medical library, we are interested in the library’s value to patient care. if there is significant use in the hospitals, this could furnish positive evidence about the library’s role in the clinical setting. next, the resource url and the referring address are truncated down to domain names. the links in the log are very specific, showing detailed user activity. because the library is operating in a medical environment, privacy is a concern and so specific addresses are truncated to a top-level domain (e.g. ovid.com) to suppress any tie to a specific article, e-book, or other specific resource. finally, a query is run against the remaining raw data to condense the log down to unique session id/resource combinations, and this block of data is inserted into a new table. each user visit to a unique resource in a single session is recorded; for example, if a user visits lexis nexis, ovid medline, scopus, and lexis nexis again in a single session, three lines will be recorded in the user activity table. a single line in the final ezproxy activity table contains a unique combination of location accessed (e.g., tisch hospital), user department (e.g., radiology), user type (e.g., staff), earliest access date/time for that resource (e.g., 9/9/201118:25), resource name (e.g., scopus.com), session id, and referring domain (e.g., hsl.med.nyu.edu). there is significant repetition in the log. depending on what filters are set up, every image within a webpage could be a line in the log. the method of condensing the data described previously results in a much smaller and more manageable dataset. for example, on a single day 115,070 rows of were collected in the ezproxy log, but only 2,198 were inserted into the final warehouse table after truncating the urls and removing redundancy. in a separate query on the raw data table, a distinct list containing user id, date, and the word “eresources” is built and stored in a “user transactions” table. this very basic data are stored so that simple user analysis can be performed (see “user data” below). information technology and libraries | september 2012 46 figure 7. line chart showing total number of ezproxy sessions captured per week over a twenty-sixweek period once the ezproxy data are transferred to the appropriate tables, the raw data (and thus the most concerning data from a privacy standpoint) is purged from the database. several dashboard charts were created using the streamlined ezproxy data, a simple count of weekly e-resource users, and a table showing resources whose use changed most significantly since the previous month. it was challenging to calculate the significance of the variations in use since resources that went from one session in a month to two sessions were showing the same proportional change as those that increased from one thousand to two thousand sessions. a basic calculation was created to highlight the more significant changes in use. d = (pq) if d<0 then significance = d—8 x 10 d q +1 if d>0 then significance = d +8 x 10 d q +1 d = difference between last month and this month p = number of visits last month (8 to 1 days ago) q = number of visits previous month (15 to 9 days ago) trends at a glance: a management dashboard of library statistics | morton-owens and hanson 47 this equation serves the basic purpose of identifying unusual changes in e-resource use. for example, one e-resource was shown trending up in use after a librarian taught a course in it. figure 8. table of e-resources showing the most significant change in use over the last month compared to the previous month the ezproxy data has already proven to be a rich source of data. the work so far has only scratched the surface of what the data could show. only two charts are currently displayed on the dashboard, but the value of thisdata is more likely to come from one-off customized reports based on specific queries, like tracking use of individual resources over time or looking at variations of use within specific buildings, departments, or user types. there is also a lot that could be done with the referrer addresses. for example, the library has been submitting tips to the newsletter that is delivered by email. the referrer log allows the number of clicks from this source to be measured so that librarians can monitor the success of this marketing technique. user data each library system includes some user information. where user information is available in a system, a separate table is populated in the warehouse. as mentioned briefly above, a user id, a date, and the type of service used (e-resources, dds, literature search, etc.) is stored. details of the transaction are not kept here. the user id can be used to look up basic information about the user such as role (faculty, staff, student) and department. we should emphasize for clarity that the detailed information about the activity is completely separated from any information about the user so that the data cannot be joined back together. information technology and libraries | september 2012 48 the most sensitive data, such as the raw ezproxy log data, is purged after the import script has copied the truncated and de-identified data. even though the data stored is very basic, information at the granularity of individual users is never displayed on the dashboard. the user information is aggregated by user type for further analysis and display. the institutional people database can be used to determine how many people are in each department. a table added to the dashboard shows the number of resource uses and the percentage of each department that used library resources in a six-month period. some potential uses of this data include identifying possible training needs and measuring the success of library outreach to specific departments. for example, if one department uses the resources very little, this may indicate a training or marketing deficit. it may also be interesting to analyze how the academic success of a department aligns with library resource use. do the highest intensity users of library resources have greater professional output or higher prestige as a research department, for example? it is unsurprising to find that medical students and librarians are most likely to use library resources. the graduate medical education group is third and includes medical residents (newly qualified doctors on a learning curve). as with the ezproxy data, there are numerous insights to be gained from this data that will help the library make strategic decisions about future services. figure 9. table showing the proportion of each user group that has used at least one library service in a six-month period results trends at a glance: a management dashboard of library statistics | morton-owens and hanson 49 the dashboard has been available for almost a year. it requires a password and is only available to nyuhsl’s senior management team and librarians who have asked for access. feedback on the dashboard has been positive, and librarians have begun to make suggestions to improve its usefulness. one librarian uses the data warehouse for his own reports and will soon provide his queries so that they can be added to the dashboard. the dashboard has facilitated discoveries about the nature of our users and has identified potential training needs and areas of weakness in outreach. a static dashboard snapshot was recently created for presentation to the dean of the medical school to illustrate the extent and breadth of library use. the initial dashboard aimed to demonstrate the kinds of library statistics that it is possible to extract and display, but there is much to be done to improve its operational usefulness. a dashboard working group has been established to build on the original proof-of-concept by improving the data model and adding relevant charts. some charts will be incorporated into the public website as a snapshot of library activity. the dashboard was structured to be adaptable and expandable. the next iteration will support customization of the display for each user. new charts will be added as requested, and charts that are perceived to be less insightful will be removed. for example, one chart shows the number of reference chat requests answered by each librarian in addition to the number of chats handled per week. the usefulness of this chart was questioned when it was observed that the results were merely a reflection of which librarians had the most time at their own desks, allowing them to answer chats. this is an example of how it can be difficult to separate context from numbers. in this instance the individual statistics were only included because the data was available, not because any particular request from management, so these charts may be removed from the dashboard. nyuhsl is also investigating the ex libris tool ustat, which supports analysis of counter (counting online usage of networked electronic resources) reports from e-resources vendors. ustat covers some of the larger gaps in the ezproxy log, including journal-level rather than vendor-level analysis, and most importantly, the use statistics for non-ezproxied addresses. a future project will be to see whether there is an automated way to extract use metrics, either from ustat or directly from the vendors to be incorporated into the data warehouse. preliminary discussion are being held with it administrators about the possibilities of ezproxying library resource urls as they pass through the firewall so that the ezproxy log becomes a more complete reflection of use. an example of a strategic decision based on dashboard data involves nyuhsl’s mobile website. librarians had been considering the question of whether to invest substantial effort in identifying and presenting free apps and mobile websites to complement the library’s small collection of licensed mobile content. the chart of website visits on the dashboard surprisingly shows that the number of visits that come from mobile devices is consistently fewer 3 percent, probably because of the relatively modest selection of mobile-optimized website resources. rather than invest information technology and libraries | september 2012 50 significant effort in cataloging additional potentially lackluster free resources that would not be seen by a large number of users, the team decided to wait for more headlining subscription-based resources to become available and increase traffic to the mobile site. it would be worthwhile to add charts to the dashboard that track metrics related to new strategic initiatives requiring librarians to translate strategic ideas into measurable quantities. for example, if the library aspired to make sure users received responses more quickly, charts tracking the response time for various services could be added and grouped together to track progress on this goal. as data continues to accumulate, it will be possible to extend the timeframe of the charts, for example, making weekly charts into monthly ones. over time, the data may become more static, requiring more complicated calculations to reveal interesting trends. conclusions the medical center has a strong ethic of metric-driven decisions, and the dashboard brings the library in line with this initiative. the dashboard allows librarians and management to monitor key library operations from a single, convenient page, with an emphasis on long-term trends rather than day-to-day fluctuations in use. it was put together using freely available tools that should be within the reach of people with moderate programming experience. assembling the dashboard required background knowledge of the systems in question, was made possible by nyuhsl’s use of open-source and homegrown software, and increased the designers’ understanding of the data and tools in question. references 1 association of academic health sciences libraries, “annual statistics,” http://www.aahsl.org/mc/page.do?sitepageid=84868 (accessed november 7, 2011); association of research libraries, “arl statistics,” http://www.arl.org/stats/annualsurveys/arlstats (accessed november 7, 2011). 2 brown university library, “dashboard_beta :: dashboard information,” http://library.brown.edu/dashboard/info (accessed january 5, 2012). 3 edward r. tufte, the visual display of quantitative information (cheshire, ct: graphics, 2001), 92. 4 ibid., 56. 5 ibid., 178. 6 ibid., 153. 7 scott bateman et al., “useful junk? the effects of visual embellishment on comprehension and memorability of charts,” chi ’10 proceedings of the 28th international conference on human factors in computing systems (new york, acm, 2010) , doi: 10.1145/1753326.1753716. http://www.aahsl.org/mc/page.do?sitepageid=84868 http://www.arl.org/stats/annualsurveys/arlstats/ http://library.brown.edu/dashboard/info/ trends at a glance: a management dashboard of library statistics | morton-owens and hanson 51 8 stephen few, information dashboard design: the effective visual communication of data (beijing: o’reilly, 2006), 98. 9 nils rasmussen, claire y. chen, and manish bansal, business dashboards: a visual catalog for design and deployment (hoboken, nj: wiley, 2009), ch. 4. 10 richard j. roiger and michael w. geatz, data mining: a tutorial-based primer (boston: addison wesley, 2003), 186. 11 one example: stefan waner and steven r. costenoble, “fitting functions to data: linear and exponential regression,” february 2008, http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html (accessed january 5, 2012). 12 emily g. morton-owens, “editorial and technological workflow tools to promote website quality,” information technology &llibraries 30, no 3 (september 2011):92–98. 13 google, “gapi—google analytics api php interface,” http://code.google.com/p/gapi-google-analyticsphp-interface (accessed january 5, 2012). 14 karen a. coombs, “lessons learned from analyzing library database usage data,” library hitech 23 (2005): 4, 598–609, doi: 10.1108/07378830510636373. http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html http://code.google.com/p/gapi-google-analytics-php-interface/ http://code.google.com/p/gapi-google-analytics-php-interface/ web services and widgets for library information systems | han 87on the clouds: a new way of computing | han 87 shape cloud computing. for example, sun’s well-known slogan “the network is the computer” was established in late 1980s. salesforce.com has been providing on-demand software as a service (saas) for customers since 1999. ibm and microsoft started to deliver web services in the early 2000s. microsoft’s azure service provides an operating system and a set of developer tools and services. google’s popular google docs software provides web-based word-processing, spreadsheet, and presentation applications. google app engine allows system developers to run their python/java applications on google’s infrastructure. sun provides $1 per cpu hour. amazon is well-known for providing web services such as ec2 and s3. yahoo! announced that it would use the apache hadoop framework to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. these examples demonstrate that cloud computing providers are offering services on every level, from hardware (e.g., amazon and sun), to operating systems (e.g., google and microsoft), to software and service (e.g., google, microsoft, and yahoo!). cloud-computing providers target a variety of end users, from software developers to the general public. for additional information regarding cloud computing models, the university of california (uc) berkeley’s report provides a good comparison of these models by amazon, microsoft, and google.4 as cloud computing providers lower prices and it advancements remove technology barriers—such as virtualization and network bandwidth—cloud computing has moved into the mainstream.5 gartner stated, “organizations are switching from factors related to cloud computing: infinite computing resources available on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., cpu by hour, storage by day).2 national institute of standards and technology (nist) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 as there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. the term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the web. this article discusses using cloud computing on an it-infrastructure level, including building virtual server nodes and running a library’s essential computer systems in remote data centers by paying a fee instead of running them on-site. the article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvantages of using the new approach. all kinds of clouds major it companies have spent billions of dollars since the 1990s to on the clouds: a new way of computing this article introduces cloud computing and discusses the author’s experience “on the clouds.” the author reviews cloud computing services and providers, then presents his experience of running multiple systems (e.g., integrated library systems, content management systems, and repository software). he evaluates costs, discusses advantages, and addresses some issues about cloud computing. cloud computing fundamentally changes the ways institutions and companies manage their computing needs. libraries can take advantage of cloud computing to start an it project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. s cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on providing access to scholarly materials and research data. there is a growing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. because of the current economic crisis, academic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. one of the hottest topics in it is cloud computing. cloud computing is not new to many of us because we have been using some of its services, such as google docs, for years. in his latest book, the big switch: rewiring the world, from edison to google, carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” his examples include amazon’s ec2 (elastic computing cloud), and s3 (simple storage) services.1 amazon’s chief technology officer proposed the following yan hantutorial yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson. 88 information technology and libraries | june 201088 information technology and libraries | june 2010 company-owner hardware and software to per-use service-based models.”6 for example, the u.s. government website (http://www.usa .gov/) will soon begin using cloud computing.7 the new york times used amazon’s ec2 and s3 services as well as a hadoop application to provide open access to public domain articles from 1851 to 1922. the times loaded 4 tb of raw tiff images and their derivative 11 million pdfs into amazon’s s3 in twenty-four hours at very reasonable cost.8 this project is very similar to digital library projects run by academic libraries. oclc announced its movement of library management services to the web.9 it is clear that oclc is going to deliver a web-based integrated library system (ils) to provide a new way of running an ils. duraspace, a joint organization by fedora commons and dspace foundation, announced that they would be taking advantage of cloud storage and cloud computing.10 on the clouds computing needs in academic libraries can be placed into two categories: user computing needs and library goals. user computing needs academic libraries usually run hundreds of pcs for students and staff to fulfill their individual needs (e.g., microsoft office, browsers, and image-, audio-, and video-processing applications). library goals a variety of library systems are used to achieve libraries’ goals to support research, learning, and teaching. these systems include the following: ■■ library website: the website may be built on simple html webpages or a content management system such as drupal, joomla, or any home-grown php, perl, asp, or jsp system. ■■ ils: this system provides traditional core library work such as cataloging, acquisition, reporting, accounting, and user management. typical systems include innovative interfaces, sirsidynix, voyager, and opensource software such as koha. ■■ repository system: this system provides submission and access to the institution’s digital collections and scholarship. typical systems include dspace, fedora, eprints, contentdm, and greenstone. ■■ other systems: for example, federated search systems, learning object management systems, interlibrary loan (ill) systems, and reference tracking systems. ■■ public and private storage: staff file-sharing, digitization, and backup. due to differences in end users and functionality, most systems do not use computing resources equally. for example, the ils is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. cloud computing brings a fundamental shift in computing. it changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. the availability of cloud computing providers allows organizations to focus on their business and leave general computing maintenance to the major it companies. in the fall of 2008, the author started to research cloud computing providers and how he could implement cloud computing for some library systems to save staff and equipment costs. in january 2009, the author started his plan to build library systems “on the clouds.” the university of arizona libraries (ual) has been a key player in the process of rebuilding higher education in afghanistan since 2001. ual librarian atifa rawan and the author have received multiple grant contracts to build technical infrastructures for afghanistan’s academic libraries. the technical infrastructure includes the following: ■■ afghanistan ils: a bilingual ils based on the open-source system koha.11 ■■ afghanistan digital libraries website (http://www.afghan digitallibraries.org/): originally built on simple html pages, later rebuilt in 2008 using the content management system joomla. ■■ a digitization management system. the author has also developed a japanese ill system (http://gif project.libraryfinder.org) for the north american coordinating council on japanese library resources. these systems had been running on ual’s internal technical infrastructure. these systems run in a complex computing environment, require different modules, and do not use computing resources equally. for example, the afghan ils runs on linux, apache, mysql, and perl. its opac and staff interface run on two different ports. the afghanistan digital libraries website requires linux, apache, mysql, and php. the japanese ill system was written in java and runs on tomcat. there are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ these systems need to be accessed in a system mode by people who are not ual employees. ■■ system rebooting time can be substantial in this infrastructure because of server setup and it policy. ■■ the current on-site server has web services and widgets for library information systems | han 89on the clouds: a new way of computing | han 89 reached its life expectancy and requires a replacement. by analyzing the complex needs of different systems and considering how to use resources more effectively, the author decided to run all the systems through one cloud computing provider. by comparing the features and the costs, linode (http://www.linode.com/) was chosen because it provides full ssh and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. in addition, other customers have provided positive reviews. in january 2009, the author purchased one node located in fremont, california, for $19.95 per month. an implementation plan (see appendix) was drafted to complete the project in phases. the author owns a virtual server and has access to everything that a physical server provides. in addition, the provider and the user community provided timely help and technical support. the migration of systems was straightforward: a linux kernel (debian 4.0) was installed within an hour, domain registration was complete and the domains went active in twenty-four hours, the afghanistan digital libraries’ website (based on joomla) migration was complete within a week, and all supporting tools and libraries (e.g., mysql, tomcat, and java sdk) were installed and configured within a few days. a month later, the afghanistan ils (based on koha) migration was completed. the ill system was also migrated without problem. tests have been performed in all these systems to verify their usability. in summary, the migration of systems was very successful and did not encounter any barriers. it addresses the issues facing us: after the migration, ssh log-ins for users who are not university employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. the hardware is maintained by the provider. the administrative gui for the linux nodes is shown in figure 1. since migration, no downtime because of hardware or other failures caused by the provider has been observed. after migrating all the systems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). another linux node (located in atalanta, georgia) was purchased for backup and monitoring (see figure 2). nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. nagios provides the following functions: (1) monitoring critical computing components, such as the network, systems, services, and servers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. a backup script is also run as a prescheduled job to back up the systems on a regular basis. figure 1. linux node administration web interface figure 2. two linux nodes located in two remote data centers node 1: 64.62.xxx.xxx (fremont, ca) node 2: 74.207.xxx.xxx (atlanta, ga) nagios backup afghan digital libraries website afghan ils interlibrary loan system dspace 90 information technology and libraries | june 201090 information technology and libraries | june 2010 findings and discussions since january 2009, all the systems have been migrated and have been running without any issues caused by the provider. the author is very satisfied with the outcomes and cost. the annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 from the author ’s experience, cloud computing provides the following advantages over the traditional way of computing in academic institutions: ■■ cost-effectiveness: from the above example and literature review, it is obvious that using cloud computing to run applications, systems, and it infrastructure saves staff and financial resources. uc berkeley’s report and zawodny’s blog provide a detailed analysis of costs for cpu hours and disk storage.13 ■■ flexibility: cloud computing allows organizations to start a project quickly without worrying about up-front costs. computing resources such as disk storage, cpu, and ram can be added when needed. in this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ data safety: organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. this strategy is very difficult to achieve in a traditional off-site backup. ■■ high availability: cloud computing providers such as microsoft, google, and amazon have better resources to provide more up-time than almost any other organizations and companies do. ■■ the ability to handle large amounts of data: cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. on-demand data storage, high availability and data safety are critical features for academic libraries.14 however, readers should be aware of some technical and business issues: ■■ availability of a service: in several widely reported cases, amazon’s s3 and google gmail were inaccessible for a duration of several hours in 2008. the author believes that the commercial providers have better technical and financial resources to keep more up-time than most academic institutions. for those wanting no single point of failure (e.g., a provider goes out of business), the author suggests storing duplicate data with a different provider or locally. ■■ data confidentiality: most academic libraries have open-access data. this issue can be solved by encrypting data before moving to the clouds. in addition, licensing terms can be negotiated with providers regarding data safety and confidentiality. ■■ data transfer bottlenecks: accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ legal jurisdiction: legal jurisdiction creates complex issues for both providers and end users. for example, canadian privacy laws regulate data privacy in public and private sectors. in 2008, the office of the privacy commissioner of canada released a finding that “outsourcing of canada .com email services to u.s.-based firm raises questions for subscribers,” and expressed concerns about public sector privacy protection.15 this brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 summary the author introduces cloud computing services and providers, presents his experience of running multiple systems such as ils, content management systems, repository software, and the other system “on the clouds” since january 2009. using cloud computing brings significant cost savings and flexibility. however, readers should be aware of technical and business issues. the author is very satisfied with his experience of moving library systems to cloud computing. his experience demonstrates a new way of managing critical computing resources in an academic library setting. the next steps include using cloud computing to meet digital collections’ storage needs. cloud computing brings fundamental changes to organizations managing their computing needs. as major organizations in library fields, such as oclc, started to take advantage of cloud computing, the author believes that cloud computing will play an important role in library it. acknowledgments the author thanks usaid and washington state university for providing financial support. the author thanks matthew cleveland’s excellent work “on the clouds.” references 1. nicholars carr, the big switch: rewiring the world, from edison to google web services and widgets for library information systems | han 91on the clouds: a new way of computing | han 91 (london: norton, 2008). 2. werner vogels, “a head in the clouds—the power of infrastructure as a service” (paper presented at the cloud computing and in applications conference (cca ’08), chicago, oct. 22–23, 2008). 3. peter mell and tim grance, “draft nist working definition of cloud computing,” national institute of standards and technology (may 11, 2009), http:// csrc.nist.gov/groups/sns/cloud-computing/index.html (accessed july 22, 2009). 4. michael armbust et al., “above the clouds: a berkeley view of cloud computing,” technical report, university of california, berkeley, eecs department, feb. 10, 2009, http://www.eecs.berkeley .edu/pubs/techrpts/2009/eecs-200928.html (accessed july 1, 2009). 5. eric hand, “head in the clouds: ‘cloud computing’ is being pitched as a new nirvana for scientists drowning in data. but can it deliver?” nature 449, no. 7165 (2007): 963; geoffery fowler and ben worthen, “the internet industry is on a cloud—whatever that may mean,” wall street journal, mar. 26, 2009, http://online.wsj.com/article/ sb123802623665542725.html (accessed july 14, 2009); stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed july 8, 2009). 6. gartner, “gartner says worldwide it spending on pace to supass $3.4 trillion in 2008,” press release, aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed july 7, 2009). 7. wyatt kash, “usa.gov, gobierno usa.gov move into the internet cloud,” government computer news, feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed july 14, 2009). 8. derek gottfrid, “self-service, prorated super computing fun!” online posting, new york times open, nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed july 8, 2009). 9. oclc online computing library center, “oclc announces strategy to move library management services to web scale,” press release, apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed july 5, 2009). 10. duraspace, “fedora commons and dspace foundation join together to create duraspace organization,” press release, may 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed july 8, 2009). 11. yan han and atifa rawan, “afghanistan digital library initiative: revitalizing an integrated library system,” information technology & libraries 26, no. 4 (2007): 44–46. 12. fowler and worthen, “the internet industry is on a cloud.” 13. jeremy zawodney, “replacing my home backup server with amazon’s s3,” online posting, jeremy zawodny’s blog, oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed june 19, 2009). 14. yan han, “an integrated high availability computing platform,” the electronic library 23, no. 6 (2005): 632–40. 15. office of the privacy commissioner of canada, “tabling of privacy commissioner of canada’s 2005–06 annual report on the privacy act: commissioner expresses concerns about public sector privacy protection,” press release, june 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed july 14, 2009); office of the privacy commissioner of canada, “findings under the personal information protection and electronic documents act (pipeda),” (sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed july 14, 2009). 16. stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed july 8, 2009). appendix. project plan: building ha linux platform using cloud computing project manager: project members: object statement: to build a high availability (ha) linux platform to support multiple systems using cloud computing in six months. scope: the project members should identify cloud computing providers, evaluate the costs, and build a linux platform for computer systems, including afghan ils, afghanistan digital libraries website, repository system, japanese interlibrary loan website, and digitization management system. resources: project deliverable: january 1, 2009—july 1, 2009 92 information technology and libraries | june 201092 information technology and libraries | june 2010 phase i ■■ to build a stable and reliable linux platform to support multiple web applications. the platform needs to consider reliability and high availability in a cost-effective manner ■■ to install needed libraries for the environment ■■ to migrate ils (koha) to this linux platform ■■ to migrate afghan digital libraries’ website (joomla) to this platform ■■ to migrate japanese interlibrary loan website ■■ to migrate digitization management system phase ii ■■ to research and implement a monitoring tool to monitor all web applications as well as os level tools (e.g. tomcat, mysql) ■■ to configure a cron job to run routine things (e.g., backup ) ■■ to research and implement storage (tb) for digitization and access phase iii ■■ to research and build linux clustering steps: 1. os installation: debian 4 2. platform environment: register dns 3. install java 6, tomcat 6, mysql 5, etc. 4. install source control env git 5. install statistics analysis tool (google analytics) 6. install monitoring tool: ganglia or nagios 7. web applications 8. joomla 9. koha 10. monitoring tool 11. digitization management system 12. repository system: dspace, fedora, etc. 13. ha tools/applications note calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ a medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ ignore telecommunication cost, utility cost, and space cost. ■■ ignore software developer’s time because it is equal for both options. appendix. project plan: building ha linux platform using cloud computing (cont.) president’s message: the year in review—open everything colleen cuddy information technologies and libraries | june 2012 1 as i sit down to write my last president’s column a variety of topics are running through my mind. but as i focus on just one word to sum up the year, “open” rises to the top of the list. for truly it was a year of all things open. my presidential theme is open data/open science and i am looking forward to hearing tony hey and clifford lynch speak at the lita president’s program later this month on this topic. dr. lynch is also the recipient of this year’s lita/library hi tech award for outstanding communication in library and information technology, cosponsored by emerald group publishing limited. the prestigious frederick g. kilgour award for research in library and information technology award, co-sponsored by oclc, is being given to g. sayeed choudhury this year. dr. choudhury is a longtime proponent of open data and the award recognizes his leadership in the field of data curation through the national science foundation supported data conservancy project. as you well know ital is now an open-access journal. open access continues to be a hot topic, and rightly so. my last column was devoted to the subject of open access, but, i do want to remind librarians to advocate for open access in the coming year—please keep up the fight! in addition to seeing our journal to its new platform, the publications committee has also been busy with a few new lita guides, one of which, “getting started with gis,” by eva dodsworth, provides some guidance on harnessing data sets to work with geospatial technology. ms. dodworth will be conducting an online course on this topic in august and the education committee has many new courses in the pipeline. internally lita has been working towards a more open and transparent governance structure. the board has been relentless in making sure that all of its meetings are open, from in-person meetings at conferences to our monthly phone meetings to conversations on ala connect. we have been streaming our board meetings live and now will archive the recordings for a limited amount of time. this move has not been without challenges as board members and the lita office struggled to build open communication with each other and the membership. sometimes the challenges were ideological or legal, and sometimes the very technology that we embrace has caused problems, but i think it is safe to say that lita leadership is working towards a common goal of a transparent structure with open communication channels. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 we opened up communication channels to get feedback on what our membership would like most when zoe stewart-marshall, incoming president, hosted a town hall meeting at the ala midwinter meeting that focused on member feedback. i know that she is working hard to address membership needs during her presidency. as a medical librarian i often travel in circles outside of ala and when my medical colleagues learned that i was lita president they were really impressed. lita is a well-known and wellrespected brand in the library community. talking to my non-lita colleagues reinforced the value that lita brings to the entire profession, particularly through our programming, education, and they way in which we share and exchange information in open forums such as the lita blog and listserv. (of course i hope that we have gained some new members through this outreach!) clearly we are doing many things right and we should not lose sight of what is great about lita as we work on addressing areas that need improvement. one thing that is consistently great about lita is its annual sponsorship of ala emerging leaders. this year we sponsored two lita members who were part of the 2012 ala emerging leaders cohort: jodie gambill and tasha keagan. both were assigned to a team working on a lita project that asked for a recommendation and plan for the implementation of a lita experts profile system. the team was responsible for identifying the software to employ and creating an implementation plan with ontology recommendations. the team has identified vivo (an opensource, semantic-web application) as the software for the project and will present its findings and implementation plan to the lita board and the ala community at the ala annual conference. the team did an outstanding job on this project and completed the deliverable on time, with very little guidance from lita leadership—a sure sign of leadership! yet, i was often reminded that as we embrace our upcoming leaders, we should not forget that leadership occurs on all levels. one message that i heard throughout my presidency is that lita should do more for mid-career librarians—and this sentiment is shared by members of other organizations in which i am active. this is a challenge that lita leadership is poised to take on as it balances its services to membership. as i now count eighteen occurrences of the word “open” in this column i believe i have made my point and it is time to sign off. although i am finishing up my duties as lita president, i am not saying goodbye. i look forward to my new role as past-president, particularly in hosting the 2012 lita national forum in columbus, ohio (october 4-7): new world of data: discover. connect. remix. the national forum planning committee led by susan sharpless smith has done an outstanding job putting together an excellent meeting. the committee has lined up interesting speakers such as eric hellman, ben schneiderman, and sarah houghton, and thoughtfully evaluated many paper and poster submissions. i am sure we will all learn quite a bit from our colleagues as we attend sessions and network. i will be hosting a dinner and i hope to see some of you there as i enjoy what i hope will be a more relaxed role as past-president. it has been an honor to serve you and i look forward to working with lita in the years to come! learning to share: measuring use of a digitized collection on flickr and in the ir melanie schlosser and brian stamper information technology and libraries | september 2012 85 abstract there is very little public data on usage of digitized library collections. new methods for promoting and sharing digitized collections are created all the time, but very little investigation has been done on the effect of those efforts on usage of the collections on library websites. this study attempts to measure the effects of reposting a collection on flickr on use of the collection in a library-run institutional repository (ir). the results are inconclusive, but the paper provides background on the topic and guidance for future efforts. introduction inspired by the need to provide relevant resources and make wise use of limited budgets, many libraries measure the use of their collections. from circulation counts and in-library use studies of print materials, to increasingly sophisticated analyses of usage of licensed digital resources, the techniques have changed even as the need for the data has grown. new technologies have simultaneously presented challenges to measuring use, and allowed those measurements to become more accurate and more relevant. in spite of the relative newness of the digital era, “librarians already know considerably more about digital library use than they did about traditional library use in the print environment.”1 arl’s libqual+,2 one of the most widelyadopted tools for measuring users’ perceptions of service quality, has recently been joined by digiqual and mines for libraries. these new statsqual tools3 extend the familiar libqual focus on users into the digital environment. there are tools and studies for seemingly every type of licensed digital content, all with an eye toward better understanding their users and making better-informed collection management decisions. those same tools and studies for measuring use of library-created digital collections are conspicuous in their absence. almost two decades into library collection digitization programs, there is not a significant body of literature on measuring use of digitized collections. a number of articles have been written about measuring usage of library websites in general; arendt and wagner4 is a recent example. in one of the few studies to specifically measure use of a digitized collection, herold5 uses google analytics to uncover the geographical location of users of a digitized archival image collection. otherwise, a literature search on usage studies uncovers very little. less formal communication channels are similarly quiet, and public usage data on digitized collections on library sites is virtually nonexistent. commercial sites for disseminating and sharing melanie schlosser (schlosser.40@osu.edu) is digital publishing librarian and brian stamper (stamper.10@osu.edu) is administrative associate, the ohio state university libraries, columbus, ohio. mailto:schlosser.40@osu.edu mailto:stamper.10@osu.edu information technology and libraries | september 2012 86 digital media frequently display simple use metrics (image views, for example, or file downloads) alongside content; such features do not appear on digitized collections on library sites. usage and digitization projects digitized library collections are created with an eye toward use from their early planning stages. an influential early clir publication on selecting collections for digitization written by a harvard task force6 included current and potential use of the analog and digitized collection as a criterion for selection. the factors to be considered include the quantitative (“how much is the collection used?”) and the qualitative (“what is the nature of the use?”). more than ten years later, ooghe and moreels7 find that use is still a criterion for selection of collections to digitize, tied closely to the value of the collection. facilitating discovery and use of the digitized collection is a major consideration during project development. payette and rieger8 is an early example of a study of the needs of users in digital library design. usability testing of the interface is frequently a component of site design; see jeng9 for a good overview of usability testing in the digital library environment. increasing usage of the digitized collection is also a major theme in metadata research and development. standards such as the open archives initiative’s protocol for metadata harvesting10 and object reuse and exchange11 are meant to allow discovery and reuse of objects in a variety of environments, and the linked data movement promises to make library data even more relevant and reusable in the world wide web environment.12 digital collection managers have also found more radical methods of increasing usage of their collections. inserting references into relevant wikipedia articles has become a popular way to drive more users to the library’s site.13 some librarians have taken the idea a step further and have begun reposting their digital content on third-party sites. the smithsonian pioneered one reposting strategy in 2008 when they partnered with flickr, the popular photo-sharing site, to launch flickr commons.14 the commons is a walled garden within flickr that contains copyrightfree images held by cultural heritage institutions such as libraries, archives, and museums. each partner institution has its own branded space “photostream” in flickr parlance organized into collections and sets. this model aggregates content from different organizations and locates it where users already are, but it still maintains the traditional institution/collection structure. flickr commons has been, by all measures, a very successful experiment in sharing collections with users. the smithsonian,15 the library of congress,16 the alcuin society,17 and the london school of economics18 have all written about their experiences with the commons. stephens19 and michel and tzoc20 give advice on how libraries can work with flickr, and garvin21 and vaughan22 take a broad view of the project and the partners. another sharing strategy is beginning to emerge, where digital collection curators contribute individual or small groups of images to thematic websites. a recent example is pets in collections,23 a whimsical tumblr photo blog created by the digital collections librarian at bryn mawr college. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 the site’s description states, “come on if you work in a library, archive, or museum, you know you’ve seen at least one of these a seemingly random image of that important person and his dog or a man and a monkey wearing overalls … so now you finally have a place to share them with the world!” the site requires submissions to include only the image and a link back to the institution or repository that houses it, although submitters may include more information if they choose. although more lighthearted than most traditional library image collections, it still performs the desired function of introducing users to digital collections they may never have encountered otherwise. clearly, these creative and thoughtful strategies are not dreamed up by digital librarians unconcerned with end use of their collections, so why do stewards of digitized collections so rarely collect, or at least publicly discuss, statistics on the use of their content? the one notable exception to this may shed some light on the matter. institutional repositories (irs) have been the one area of non-licensed digital library content where usage statistics are frequently collected and publicized. dspace,24 the widely-adopted ir platform developed by mit and hewlett-packard, has increasingly sophisticated tools for tracking and sharing use of the content it hosts. digital commons,25 the hosted ir solution created by bepress, provides automated monthly download reports for scholars who use it to archive their content. the development of these features has been driven by the need to communicate value to faculty and administrators. encouraging participation by faculty has been a major focus of ir managers since the initial ‘build it and they will come’ optimism faded and the challenge of adding another task to already busy faculty schedules became clear.26 having a clear need (outreach) and a defined audience (participating scholars) has led to a thriving program of usage tracking in the ir community. the lack of an obvious constituency and the absence of pointed questions about use in the digitized collections world have, one suspects, led to the current dearth of measurement tools and initiatives. still, questions about use do arise, particularly when libraries undertake laborintensive usability studies or venture into the somewhat controversial landscape of sharing library-created digital objects on third party sites.27 anecdotally, the thought of sharing library content elsewhere on the web also raises concerns about loss of context and control, as well as a fear of ‘dilution’ of the library’s web presence. “if patrons can use the library’s collections on other sites,” a fellow librarian once exclaimed, “they won’t come to the library’s website anymore!” without usage data, we cannot adequately answer questions about the value of our projects or the way they impact other library services. justification for study and research questions there were three major motivations for this project. first, inspired by the success of the flickr commons project, we wanted to explore a method for sharing our collections more widely. an image collection and a third-party image-sharing platform were an obvious choice, since image display is not a strength of our dspace-based repository. flickr is currently a major presence in information technology and libraries | september 2012 88 the image sharing landscape, and the existence of the commons was an added incentive for choosing flickr as our platform. second, the collection we selected for the project (described more fully below) is not fully described, and we wanted to take advantage of flickr’s annotation tools to allow user-generated metadata. since further description of the images would have required an unusual depth of expertise, we were not optimistic that we would receive much useful data, and in fact we did not. still, we lost nothing by asking, and gained familiarity with flickr’s capabilities for metadata capture. the final motivation for the project, and the focus of the study, was the desire to investigate the effect of third-party platform sharing of a local collection on usage of that collection on library sites. the data gathered were meant partly to inform our local practice, but also to address a concern that may hold librarians back from exploring such means of increasing collection usage the fear that doing so will divert traffic from library sites. we suspected that sharing collections more widely would actually increase usage of the items on library-owned sites, and the study was developed to explore the issue in a rigorous way. the research question for this study was: does reposting digitized images from a library site to a third-party image sharing site have an effect on usage of the images on the library site? about the study platforms for the study, the images were submitted to two different platforms the knowledge bank (kb),28 a library-managed repository, and flickr, a commercial image sharing site. the kb is an institutional repository built on dspace software with a manakin (xml-based) user interface. established in 2005, it holds more than 45,000 items, including faculty and student research, gray literature, institutional records, and digitized library collections. image collections like the one used in this study make up a small percentage of the items in the repository. in the kb’s organizational structure, the images in the study were submitted as a collection in the library’s community, under a sub-community for the special collection that contributed them. each image was submitted as an item consisting of one image file and dublin core metadata.29 the project originally called for submitting the images to flickr commons, but the commons was not accepting new partners during the study period. instead, we created a standard flickr pro account for the libraries, while following the commons guidelines in image rights and settings. in contrast to dspace’s community/sub-community/collection structure, flickr images are organized in sets, sets belong to collections, and all images make up the account owner’s photostream. a set was created for the images, with accompanying text giving background information and inviting users to contribute to the description of the images.30 the images were accompanied by the same metadata as the items in the kb, but the files themselves were higher resolution, to take advantage of flickr’s ability to display a range of sizes for each image. all items in the set were publicly learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 available for viewing, commenting, and tagging, and each image was accompanied by links back to the kb at the item, collection, and repository level. the collection the choice of a collection for the study was limited by a number of factors. first, and most obviously, it needed to be an image collection. second, it needed to be in the public domain, both to allow our digitization and distribution of the images, and also to satisfy flickr commons’ “no known copyright restrictions” requirement.31 this could be accomplished either by choosing a collection whose copyright protections had expired, or by removing restrictions from a collection to which the libraries owned the rights. third, the curator of the collection needed to be willing and able to post the images on a commercial site. this required not only an open-minded curator, but also a collection without a restrictive donor agreement or items containing sensitive or private information. finally, we wanted the collection to be of broad public interest. the collection chosen for the study was a set of 163 photographs from osu’s charles h. mccaghy collection of exotic dance from burlesque to clubs, held by the jerome lawrence and robert e. lee theatre research institute.32 the photographs, mainly images of burlesque dancers, were published on cabinet and tobacco cards in the 1890s, putting them solidly in the public domain. figure 1. "the devil's auction," j. gurney & son (studio). http://hdl.handle.net/1811/47633 (kb), http://www.flickr.com/photos/60966199@n08/5588351865/ (flickr) http://hdl.handle.net/1811/47633 learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 methodology phases the study took place in 2011 and was organized in three ten-week phases. for the first phase (january 31 through april 11), the images were submitted to the kb. the purpose of this phase was to provide a baseline level of usage for the images in the repository. in phase two (april 12 through june 20), half of the images were randomly selected and submitted to flickr (group a). the purpose of this phase was to determine what effect reposting would have on usage of items in the repository both on those images that were reposted, and on other images in the same collection that had not been reposted. in phase three (june 21 through august 29), the rest of the images (group b) were submitted to flickr. in this phase, we began publicizing the collection. publicity consisted of sharing links to the collection on social media and sending emails to scholars in relevant fields via email lists. these efforts led to further downstream publicity on popular and scholarly blogs.33 data collection the unit of measurement for the study was views of individual images. to understand the notion of a “view,” we must contrast two different ways that an image may be viewed in the knowledge bank. each image in the collection has an individual web page (the item page) where it is presented along with metadata describing it. in addition, from that page a visitor may download and save the image file itself (in this collection, a jpeg). in the former case, the image is an element in a web page, while in the latter it is an image file independent of its web context. search engines and other sources commonly link directly to such files, so it is not unusual for a visitor to download a file without ever having seen it in context. in light of this, we produced two data sets, one for visits to item pages, and another for file downloads. depending on one’s interpretation, either could be construed as a “view.” ultimately there was little distinction in usage patterns between the two types of measure. the data were generated by making use of dspace’s apache solr-based statistics system, which provides a queryable database of usage events. for each item in the study, we made two queries; one for per-day counts of item page views, and another for per-day counts of image file downloads (called “bitstream” downloads in dspace parlance.) in both cases, views that came from automated sources such as search engine indexing agents were excluded from our counts. views of the images in flickr were noted and used as a benchmark, but were not the focus of the study. unlike cumulative views, which are tabulated and saved indefinitely, flickr saves daily view numbers for only thirty days. as a result, daily view numbers for most of the study period were not available for analysis, and the discussion of the trends in the flickr data is necessarily anecdotal. information technology and libraries | september 2012 88 results at the end of the study period, the data showed very little usage of the collection in the repository. this lack of usage was relatively consistent through the three phases of the study, and in rough terms translates to less than one view of each item per day. of the two ways of measuring an image "view" either by counting views of the web page where the item can be found or by counting how many times the image file was downloaded there was little distinction. knowledge bank item pages received between 5 and 38 views per item, while files were downloaded between 5 and 34 times. further, there were no significant differences in number of views received between the first group released to flickr and the second. kb item page views image file downloads min median max min median max group a (images released to flickr in phase ii) 5 10 35 5 9 25 group b (images released to flickr in phase iii) 6 10 38 4 9 34 table 1. the items in the study are divided into group a and group b, depending on when the images were placed on flickr. this table shows that both groups received similar traffic over the course of the study, with items having between 5 and 38 views in both groups, with a median of 10 for both, and between 4 and 34 downloads, with a median of 9 for both groups. the items attracted more visitors on flickr, with the images receiving between 100 and 600 views each. with a few exceptions, the items that appeared towards the beginning of the set (as viewed by a user who starts from the set home page) received more views than items towards its end. this suggests a particular usage pattern start at the beginning, browse through a certain number of images, and navigate away. a more significant trend in the flickr data is that most views of the images came after publicity for the collection began (approximately midway through the third phase of the study). again, the lack of daily usage numbers on flickr makes it impossible to demonstrate the publicity ‘bump,’ but it was dramatic. we witnessed a similar, if smaller, ‘bump’ in usage of the items in the kb after publicity started. we were also able to identify 65 unique visitors to the kb who came to the site via a link on flickr, out of 449 unique visitors overall. of those who came to the kb from flickr, 31 continued on to other parts of the kb, and the rest left after viewing a single item or image. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 discussion with so little data, we cannot reliably answer the primary research question. reposting certainly does not seem to have lowered usage of the items in the kb, but the numbers of views in all phases were so small as to preclude drawing meaningful conclusions. a larger issue is the fact that much of the usage came immediately following our promotional efforts. this development complicated the research in a number of ways. first, because the promotional emails and social media messages specifically pointed users to the collection in flickr, it is impossible to know how the use may have differed if the primary link in the promotion had been to the knowledge bank. would the higher use seen on flickr simply have transferred to the kb? would the unfamiliarity and non-image-centric interface of the knowledge bank have thwarted casual users in their attempt to browse the collection? the centrality of the promotion efforts also suggests that one of the underlying assumptions of the study may have been wrong. this research project was premised on the idea that an openly available collection on a library website will attract a certain number of visitors (number dependent on the popularity and topicality of the subject of the collection) who find the content spontaneously via searching and browsing. placing that same content on a third-party site could theoretically divert a percentage of those users, who would then never visit the library’s site. the percentage of users diverted would likely depend on how many more users browse the third party site than the library site, as well as the relative position of the two in search rankings. the mccaghy collection should have been a good candidate for this type of use pattern. flickr is certainly heavily used and browsed, and burlesque, while not currently making headlines, is a subject with fairly broad popular appeal. the fact that users did not spontaneously discover the collection on either platform in significant numbers suggests that this may not be how discovery of library digitized collections works. it is not surprising that email lists and social media should drive larger numbers of users to a collection than happenstance the power of link curation by trusted friends via informal communication channels is well known. what is surprising is that it was the only significant use pattern in evidence. the primary takeaway is that promotion is key. if we do not promote our collections to the people who are likely to be interested in them, barring a stroke of luck, it is unlikely that they will be found. anecdotally, promotional efforts are often an afterthought in digital collections work a pleasant but unnecessary ‘extra.’ in our environment, the repository staff often feel that promotion is the work of the collection owner, who may not think of promoting the collection in the digital environment, nor know how to do so. as a result, users who would benefit from the collections simply do not know they exist. these results also suggest that librarians worried about the consequences of sharing their collections on third party sites may be worrying about the wrong thing. the sheer volume of information on any given topic makes it unlikely that any but the most dedicated researcher will information technology and libraries | september 2012 90 explore all available sources. most other users are likely to rely on trusted information sources (traditional media, blogs, social networking sites) to steer them towards the items that are most likely to interest them. instead of wondering if users will still come to the library’s site if the content is available elsewhere, perhaps we should be asking of our digital collections, “is anyone using them on any site?” and if the answer is no, the owners and caretakers of those collections should explore ways to bring them to the attention of relevant audiences. conclusion as a usage study of a collection hosted on a library site and a commercial site, this project was not a success. flawed assumptions and a lack of usable data resulted in an inability to address the primary research question in a meaningful way. however, it does shed light on the questions that motivated it. are our digitized collections being used? what effect do current methods of sharing and promotion have on that use? librarians working with digitized collections have fallen behind our colleagues in the print and institutional repository arenas in measuring use of collections, but we have the same needs for usage data. in the current climate of heightened accountability in higher education and publicly funded institutions, we need to demonstrate the value of what we do. we need to know when our efforts to promote our collections are working, and determine which projects have been most successful and merit continued development. and as always, we need to share our results, both formally and informally, with our colleagues. measuring use of digital resources is challenging, and obtaining accurate usage statistics requires not only familiarity with the tools involved, but also some understanding of the ways in which the numbers can be unrepresentative of actual use. the organizations that do collect usage statistics on their digitized collections should share their methods and their results with others to help foster an environment where such data are collected and used. next steps in this area could take the shape of further research projects, or simply more visible work collecting usage statistics on digital collections. of greatest utility to the field would be data demonstrating the relative effectiveness of different methods of increasing use. do labor-intensive usability studies deliver returns in the form of increased use of the finished site? which forms of reposting generate the most views? what types of publicity are most effective in bringing users to collections? how does use of a collection change over time? there are also more policy-driven questions to be answered. for example, should further investment in a collection or site be tied to increasing use of low-traffic collections, or capitalizing on success? differences in topic, format, and audience make it difficult to generalize in this area, but we can begin building a body of knowledge that helps us learn from each other’s successes and failures. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 91 references 1 brinley franklin, martha kyrillidou, and terry plum. "from usage to user: library metrics and expectations for the evaluation of digital libraries." in evaluation of digital libraries: an insight into useful applications and methods, ed. giannis tsakonas and christos papatheodorou, 17-39. (oxford: chandos publishing, 2009). http://www.libqual.org/publications (accessed february 29, 2012) 2 “libqual+,” accessed february 29, 2012. http://www.libqual.org/home 3 “statsqual,” accessed february 29, 2012. http://www.digiqual.org/ 4 julie arendt and cassie wagner. "beyond description: converting web site usage statistics into concrete site improvement ideas." journal of web librarianship 4, no. 1 (2010): 37-54. 5 irene m. h. herold. "digital archival image collections: who are the users?" behavioral & social sciences librarian 29, no. 4 (2010): 267-282. 6 dan hazen, jeffrey horrell, and jan merrill-oldham. selecting research collections for digitization. (council on library and information resources, 1998). http://www.clir.org/pubs/reports/hazen/pub74.html (accessed february 29, 2012) 7 bart ooghe and dries moreels. "analysing selection for digitisation: current practices and common incentives." d-lib magazine 15, no. 9 (2009): 28. http://www.dlib.org/dlib/september09/ooghe/09ooghe.html. 8 sandra d. payette and oya y. rieger. "supporting scholarly inquiry: incorporating users in the design of the digital library." the journal of academic librarianship 24, no. 2 (1998): 121-129. 9 judy jeng. "what is usability in the context of the digital library and how can it be measured?" information technology & libraries 24, no. 2 (2005): 47-56. 10 “open archives initiative protocol for metadata harvesting,” accessed february 29, 2012. http://www.openarchives.org/pmh/ 11 “open archives initiative object reuse and exchange,” accessed february 29, 2012. http://www.openarchives.org/ore/ 12 eric miller and micheline westfall. "linked data and libraries." serials librarian 60, no. 1&4 (2011): 17-22. 13 ann m. lally and carolyn e. dunford. “using wikipedia to extend digital collections,” d-lib magazine 13, no. 5&6 (2007). accessed february 29, 2012. doi:10.1045/may2007-lally 14 “flickr: the commons,” accessed february 29, 2012. http://www.flickr.com/commons/ 15 martin kalfatovic, effie kapsalis, katherine spiess, anne camp, and michael edson. "smithsonian team flickr: a library, archives, and museums collaboration in web 2.0 space." archival science 8, no. 4 (2008): 267-277. http://www.libqual.org/publications http://www.libqual.org/home http://www.digiqual.org/ http://www.clir.org/pubs/reports/hazen/pub74.html http://www.dlib.org/dlib/september09/ooghe/09ooghe.html http://www.openarchives.org/pmh/ http://www.openarchives.org/ore/ http://www.flickr.com/commons/ information technology and libraries | september 2012 92 16 josh hadro. "lc report positive on flickr pilot." library journal 134, no. 1 (2009): 23. 17 jeremiah saunders. “flickr as a digital image collection host: a case study of the alcuin society,” collection management 33, no. 4 (2008): 302-309. doi: 10.1080/01462670802360387 18 victoria carolan and anna towlson. "a history in pictures: lse archives on flickr." aliss quarterly 6 (2011): 16-18. 19 michael stephens. "flickr." library technology reports 42, 4 (2006): 58-62. 20 jason paul michel and elias tzoc. "automated bulk uploading of images and metadata to flickr." journal of web librarianship 4, no. 4 (10, 2010): 435-448. 21 peggy garvin. "photostreams to the people." searcher 17, no. 8 (2009): 45-49. 22 jason vaughan. "insights into the commons on flickr." portal: libraries & the academy 10, no. 2 (2010): 185-214. 23 “pets-in-collections,” accessed february 29, 2012. http://petsincollections.tumblr.com/ 24 “dspace,” accessed february 29, 2012. http://www.dspace.org/ 25 “digital commons,” accessed february 29, 2012. http://digitalcommons.bepress.com/ 26 dorothea salo. "innkeeper at the roach motel." library trends 57, no. 2 (2008): 98-123. 27 for an example of the type of debate that tends to surround projects like flickr commons, see http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/. (accessed february 29, 2012) 28 “the knowledge bank,” accessed february 29, 2012. http://kb.osu.edu 29 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://hdl.handle.net/1811/47556 30 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://flic.kr/s/ahsjua3bgi 31 “flickr: the commons (usage),” accessed february 29, 2012. http://www.flickr.com/commons/usage/ 32 “the jerome lawrence and robert e. lee theatre research institute,” http://library.osu.edu/find/collections/theatre-research-institute/; “charles h. mccaghy collection of exotic dance from burlesque to clubs,” http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-andspecial-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/; “loose women in tights digital exhibit,” http://library.osu.edu/find/collections/theatreresearch-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/. accessed february 29, 2012. http://petsincollections.tumblr.com/ http://www.dspace.org/ http://digitalcommons.bepress.com/ http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/.%29 http://hdl.handle.net/1811/47556 http://flic.kr/s/ahsjua3bgi http://www.flickr.com/commons/usage/ http://library.osu.edu/find/collections/theatre-research-institute/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 93 33 for an example of the kind of coverage it received, see http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesqueperformers (accessed february 29, 2012) http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers batch loading collections into dspace | walsh 117 maureen p. walsh batch loading collections into dspace: using perl scripts for automation and quality control colleagues briefly described batch loading marc metadata crosswalked to dspace dublin core (dc) in a poster session.2 mishra and others developed a perl script to create the dspace archive directory for batch import of electronic theses and dissertations (etds) extracted with a java program from an in-house bibliographic database.3 mundle used perl scripts to batch process etds for import into dspace with marc catalog records or excel spreadsheets as the source metadata.4 brownlee used python scripts to batch process comma-separated values (csv) files exported from filemaker database software for ingest via the dspace item importer.5 more in-depth descriptions of batch loading are provided by thomas; kim, dong, and durden; proudfoot et al.; witt and newton; drysdale; ribaric; floyd; and averkamp and lee. however, irrespective of repository software, each describes a process to populate their repositories dissimilar to the workflows developed for the knowledge bank in approach or source data. thomas describes the perl scripts used to convert marc catalog records into dc and to create the archive directory for dspace batch import.6 kim, dong, and durden used perl scripts to semiautomate the preparation of files for batch loading a university of texas harry ransom humanities research center (hrc) collection into dspace. the xml source metadata they used was generated by the national library of new zealand metadata extraction tool.7 two subsequent projects for the hrc revisited the workflow described by kim, dong, and durden.8 proudfoot and her colleagues discuss importing metadata-only records from departmental refbase, thomson reuters endnote, and microsoft access databases into eprints. they also describe an experimental perl script written to scrape lists of publications from personal websites to populate eprints.9 two additional workflow examples used citation databases as the data source for batch loading into repositories. witt and newton provide a tutorial on transforming endnote metadata for digital commons with xslt (extensible stylesheet language transformations).10 drysdale describes the perl scripts used to convert thomson reuters reference manager files into xml for the batch loading of metadata-only records into the university of glascow’s eprints repository.11 the glascow eprints batch workflow is additionally described by robertson and nixon and greig.12 several workflows were designed for batch loading etds into repositories. ribaric describes the automatic this paper describes batch loading workflows developed for the knowledge bank, the ohio state university’s institutional repository. in the five years since the inception of the repository approximately 80 percent of the items added to the knowledge bank, a dspace repository, have been batch loaded. most of the batch loads utilized perl scripts to automate the process of importing metadata and content files. custom perl scripts were used to migrate data from spreadsheets or comma-separated values files into the dspace archive directory format, to build collections and tables of contents, and to provide data quality control. two projects are described to illustrate the process and workflows. t he mission of the knowledge bank, the ohio state university’s (osu) institutional repository, is to collect, preserve, and distribute the digital intellectual output of osu’s faculty, staff, and students.1 the staff working with the knowledge bank have sought from its inception to be as efficient as possible in adding content to dspace. using batch loading workflows to populate the repository has been integral to that efficiency. the first batch load into the knowledge bank was august 29, 2005. over the next four years, 698 collections containing 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the knowledge bank. these batch loaded collections vary from journal issues to photo albums. the items include articles, images, abstracts, and transcripts. the majority of the batch loads, including the first, used custom perl scripts to migrate data from microsoft excel spreadsheets into the dspace batch import format for descriptive metadata and content files. perl scripts have been used for data cleanup and quality control as part of the batch load process. perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the knowledge bank. the workflows using perl scripts to automate batch import into dspace have evolved through an iterative process of continual refinement and improvement. two knowledge bank projects are presented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ literature review batch ingesting is acknowledged in the literature as a means of populating institutional repositories. there are examples of specific batch loading processes minimally discussed in the literature. branschofsky and her maureen p. walsh (walsh.260@osu.edu) is metadata librarian/ assistant professor, the ohio state university libraries, columbus, ohio. 118 information technology and libraries | september 2010 relational database postgresql 8.1.11 on the red hat enterprise linux 5 operating system. the structure of the knowledge bank follows the hierarchical arrangement of dspace. communities are at the highest level and can be divided into subcommunities. each community or subcommunity contains one or more collections. all items—the basic archival elements in dspace—are contained within collections. items consist of metadata and bundles of bitstreams (files). dspace supports two user interfaces: the original interface based on javaserver pages (jspui) and the newer manakin (xmlui) interface based on the apache cocoon framework. at this writing, the knowledge bank continues to use the jspui interface. the default metadata used by dspace is a qualified dc schema derived from the dc library application profile.18 the knowledge bank uses a locally defined extended version of the default dspace qualified dc schema, which includes several additional element qualifiers. the metadata management for the knowledge bank is guided by a knowledge bank application profile and a core element set for each collection within the repository derived from the application profile.19 the metadata librarians at osul create the collection core element sets in consultation with the community representatives. the core element sets serve as metadata guidelines for submitting items to the knowledge bank regardless of the method of ingest. the primary means of adding items to collections in dspace, and the two ways used for knowledge bank ingest, are (1) direct (or intermediated) author entry via the dspace web item submission user interface and (2) in batch via the dspace item importer. recent enhancements to dspace, not yet fully explored for use with the knowledge bank, include new ingest options using simple web-service offering repository deposit (sword), open archives initiative object reuse and exchange (oai-ore), and dspace package importers such as the metadata encoding and transmission standard submission information package (mets sip) preparation of etds from the internet archive (http:// www.archive.org/) for ingest into dspace using php utilities.13 floyd describes the processor developed to automate the ingest of proquest etds via the dspace item importer.14 also using proquest etds as the source data, averkamp and lee described using xslt to transform the proquest data to bepress’ (the berkeley electronic press) schema for batch loading into a digital commons repository.15 the knowledge bank workflows described in this paper use perl scripts to generate dc xml and create the archive directory for batch loading metadata records and content files into dspace using excel spreadsheets or csv files as the source metadata. ■■ background the knowledge bank, a joint initiative of the osu libraries (osul) and the osu office of the chief information officer, was first registered in the registry of open access repositories (roar) on september 28, 2004.16 as of december 2009 the repository held 40,686 items in 1,192 collections. the knowledge bank uses dspace, the open-source java-based repository software jointly developed by the massachusetts institute of technology libraries and hewlett-packard.17 as a dspace repository, the knowledge bank is organized by communities. the fifty-two communities currently in the knowledge bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. the commonality of the varied knowledge bank communities is their affiliation with osu and their production of knowledge in a digital format that they wish to store, preserve, and distribute. the staff working with the knowledge bank includes a team of people from three osul areas—technical services, information technology, and preservation—and the contracted hours of one systems developer from the osu office of information technology (oit). the osul team members are not individually assigned full-time to the repository. the current osul team includes a librarian repository manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff member, and one graduate assistant. the knowledge bank is currently running dspace 1.5.2 and the figure 1. dspace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified dublin core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... batch loading collections into dspace | walsh 119 ■■ case studies the issues of the ohio journal of science ojs was jointly published by osu and the ohio academy of science (oas) until 1974, when oas took over sole control of the journal. the issues of ojs are archived in the knowledge bank with a two year rolling wall embargo. the issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the knowledge bank. due to rights issues, the retrospective batch loading project had two phases. the project to digitize ojs began with the 1900–1972 issues that osu had the rights to digitize and make publicly available. osu later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. the two phases of batch loads were the most complicated automated batch loading processes developed to date for the knowledge bank. to batch load phase 1 in 2005 and phase 2 in 2006, the systems developers working with the knowledge bank wrote scripts to build collections, generate dc xml from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into dspace. the ojs community in the knowledge bank is organized by collections representing each issue of the journal. the systems developers used scripts to automate the building of the collections in dspace because of the number needed as part of the retrospective project. the individual articles within the issues are items within the collections. there is a table of contents for the articles in each issue as part of the collection homepages.21 again, due to the number required for the retrospective project, the systems developers used scripts to automate the creation and loading of the tables of contents. the tables of contents are contained in the html introductory text section of the collection pages. the tables of contents list title, authors, and pages. they also include a link to the item record and a direct link to the article pdf that includes the file size. for each phase of the ojs project, a vendor contracted by osul supplied the article pdfs and an excel spreadsheet with the article-level metadata. the metadata format. this paper describes ingest via the dspace batch item importer. the dspace item importer is a command-line tool for batch ingesting items. the importer uses a simple archive format diagramed in figure 1. the archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. each item’s descriptive metadata is contained in a dc xml file. the format used by dspace for the dc xml files is illustrated in figure 2. automating the process of creating the unix archive directory has been the main function of the perl scripts written for the knowledge bank batch loading workflows. a systems developer uses the test mode of the dspace item importer tool to validate the item directories before doing a batch load. any significant errors are corrected and the process is repeated. after a successful test, the batch is loaded into the staging instance of the knowledge bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. after a successful load into the staging instance the batch is loaded into the production instance of the knowledge bank. most of the knowledge bank batch loading workflows use excel spreadsheets or csv files as the source for the descriptive item metadata. the creation of the metadata contained in the spreadsheets or files has varied by project. in some cases the metadata is created by osul staff. in other cases the metadata is supplied by knowledge bank communities in consultation with a metadata librarian or by a vendor contracted by osul. whether the source metadata is created in-house or externally supplied, osul staff are involved in the quality control of the metadata. several of the first communities to join the knowledge bank had very large retrospective collection sets to archive. the collection sets of two of those early adopters, the journal issues of the ohio journal of science (ojs) and the abstracts of the osu international symposium on molecular spectroscopy currently account for 59 percent of the items in the knowledge bank.20 the successful batch loading workflows developed for these two communities—which continue to be active content suppliers to the repository—are presented as case studies. figure 2. dspace qualified dublin core xml <dublin_core> <dcvalue element="title" qualifier="none">notes on the bird life of cedar point</dcvalue> <dcvalue element="date" qualifier="issued">1901-04</dcvalue> <dcvalue element="creator" qualifier="none">griggs, robert f.</dcvalue> </dublin_core> 120 information technology and libraries | september 2010 article-level metadata to knowledge bank dc, as illustrated in table 1. the systems developers used the mapping as a guide to write perl scripts to transform the vendor metadata into the dspace schema of dc. the workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. due to a staff change between the two phases of the project, a former osul systems developer was responsible for batch loading phase 1 and the oit systems developer was responsible for phase 2. the phase 1 scripts were all written in perl. the four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the html introduction table of contents for each collection, and loaded the tables of contents into dspace via the database. for phase 2, the oit systems developer modified and added to the phase 1 batch processing scripts. this case study focuses on phase 2 of the project. batch processing for phase 2 of ojs the annotated scripts the oit systems developer used for phase 2 of the ojs project are included in appendix a, available on the italica weblog (http://ital-ica .blogspot.com/). a shell script (mkcol.sh) added collections based on a listing of the journal issues. the script performed a login as a selected user id to the dspace web interface using the web access tool curl. a subsequent simple looping perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the knowledge bank. the metadata.pl script created the archive directory for each collection. the oit systems developer added the pdf file for each item to unix. the vendor-supplied metadata was saved as unicode text format and transferred to unix for further processing. the developer used vi commands to manually modify metadata for characters illegal in xml (e.g., “<” and “&”). (although manual steps were used for this project, the oit systems developer improved the perl scripts for subsequent projects by adding code for automated transformation of the input data to help ensure xml validity.) the metadata.pl script then processed each line of the metadata along with the corresponding data file. for each item, the script created the dc xml file and the contents file and moved them and the pdf file to the proper directory. load sets for each collection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. the items for each collection were loaded by a small perl script (loaditems. pl) that used the list of issues and their collection ids and called a shell script (import.sh) for the actual load. the tables of contents for the issues were added to the knowledge bank after the items were loaded. a perl script (intro.pl) created the tables of contents using the metadata and the dspace map file, a stored mapping of item received from the vendor had not been customized for the knowledge bank. the ojs issues were sent to a vendor for digitization and metadata creation before the knowledge bank was chosen as the hosting site of the digitized journal. the osu digital initiatives steering committee 2002 proposal for the ojs digitization project had predated the knowledge bank dspace instance. osul staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. the vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. in addition to other quality checks performed, osul staff edited the author names in the spreadsheet to conform to dspace author-entry convention (surname first). semicolons were added to separate author names, and the extraneous ands were removed. a former metadata librarian mapped the vendor-supplied table 1. mapping of vendor metadata to qualified dublin core vendor-supplied metadata knowledge bank dublin core file [n/a: pdf file name] cover title dc.identifier.citation* issn dc.identifier.issn vol. dc.identifier.citation* iss. dc.identifier.citation* cover date dc.identifier.citation* year dc.date.issued month dc.date.issued fpage dc.identifier.citation* lpage dc.identifier.citation* article title dc.title author names dc.creator institution dc.description abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [cover title]. v[vol.], n[iss.] ([cover date]), [fpage]-[lpage] batch loading collections into dspace | walsh 121 directories to item handles created during the load. the tables of contents were added to the knowledge bank using a shell script (installintro.sh) similar to what was used to create the collections. installintro.sh used curl to simulate a user adding the data to dspace by performing a login as a selected user id to the dspace web interface. a simple looping perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osu international symposium on molecular spectroscopy the knowledge bank contains the abstracts of the papers presented at the osu international symposium on molecular spectroscopy (mss), which has met annually since 1946. beginning with the 2005 symposium, the complete presentations from authors who have authorized their inclusion are archived along with the abstracts. the mss community in the knowledge bank currently contains 17,714 items grouped by decade into six collections. the six collections were created “manually” via the dspace web interface prior to the batch loading of the items. the retrospective years of the symposium (1946–2004) were batch loaded in three phases in 2006. each symposium year following the retrospective loads was batch loaded individually. retrospective mss batch loads the majority of the abstracts for the retrospective loads were digitized by osul. a vendor was contracted by osul to digitize the remainder and to supply the metadata for the retrospective batch loads. the files digitized by osul were sent to the vendor for metadata capture. osul provided the vendor a metadata template derived from the mss core element set. the metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. to facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using latex, a document preparation system used for scientific data. the vendor delivered the metadata in excel spreadsheets as per the spreadsheet template provided by osul. quality-checking the metadata was an essential step in the workflow for osul. the metadata received for the project required revisions and data cleanup. the vendor originally supplied incomplete files and spreadsheets that contained data errors, including incorrect numbering, data in the wrong fields, and inconsistency with the latex encoding. the three knowledge bank batch load phases for the retrospective mss project corresponded to the staged receipt of metadata and digitized files from the vendor. the annotated scripts used for phase 2 of the project, which included twenty years of the osu international symposium between 1951 and 1999, are included in appendix b, available on the italica weblog. the oit systems developer saved the metadata as a tab-separated file and added it to unix along with the abstract files. a perl script (mkxml2.pl) transformed the metadata into dc xml and created the archive directories for loading the metadata and abstract files into the knowledge bank. the script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. the script added the constant data for type and language that was not included in the vendor-supplied metadata. unlike the ojs project, where multiple authors were on the same line of the metadata file, the mss phase 2 script had to code for authors and their affiliations on separate lines. once the load sets were made, the oit systems developer ran a shell script to load them. the script (import_ collections.sh) was used to run the load for each set so that the dspace item import command did not need to be constructed each time. annual mss batch loads a new workflow was developed for batch loading the annual mss collection additions. the metadata and item files for the annual collection additions are supplied by the mss community. the community provides the symposium metadata in a csv file and the item files in a tar archive file. the symposium uses a web form for latex–formatted abstract submissions. the community processes the electronic symposium submissions with a perl script to create the csv file. the metadata delivered in the csv file is based on the template created by the author, which details the metadata requirements for the project. the oit systems developer borrowed from and modified earlier perl scripts to create a new script for batch processing the metadata and files for the annual symposium collection additions. to assist with the development of the new script, i provided the developer a mapping of the community csv headings to the knowledge bank dc fields. i also provided a sample dc xml file to illustrate the desired result of the perl transformation of the community metadata into dc xml. for each new year of the symposium, i create a sample dc xml result for an item to check the accuracy of the script. a dc xml example from a 2009 mss item is included in appendix c, available on the italica weblog. unlike the previous retrospective mss loads in which the script processed multiple years of the symposium, the new script processes one year at a time. the annual symposiums are batch loaded individually into one existing mss decade collection. the new script for the annual loads was tested and refined by loading the 2005 symposium into the staging instance of the 122 information technology and libraries | september 2010 ■■ summary and conclusion each of the batch loads that used perl scripts had its own unique features. the format of content and associated metadata varied considerably, and custom scripts to convert the content and metadata into the dspace import format were created on a case-by-case basis. the differences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identified. because of the differences in supplied metadata, a separate perl script for generating the dc xml and archive directory for batch loading was written for each project. each new perl script borrowed from and modified earlier scripts. many of the early batch loads were firsts for the knowledge bank and the staff working with the repository, both in terms of content and in terms of metadata. dealing with communityand vendor-supplied metadata and various encodings (including latex), each of the early loads encountered different data obstacles, and in each case solutions were written in perl. the batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful knowledge bank. problems encountered with character encoding and file types were resolved by modifying the script. the metadata and files for the symposium years 2005, 2006, and 2007 were made available to osul in 2007, and each year was individually loaded into the existing knowledge bank collection for that decade. these first three years of community-supplied csv files contained author metadata inconsistent with knowledge bank author entries. the names were in direct order, uppercase, split by either a semicolon or “and,” and included extraneous data, such as an address. the oit systems developer wrote a perl script to correct the author metadata as part of the batch loading workflow. an annotated section of that script illustrating the author modifications is included in appendix d, available on the italica weblog. the mss community revised the perl script they used to generate the csv files by including an edited version of this author entry correction script and were able to provide the expected author data for 2008 and 2009. the author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. the receipt of consistent data from the community for the last two years has facilitated the standardized workflow for the annual mss loads. the scripts used to batch load the 2009 symposium year are included in appendix e, which appears at the end of this text. the oit systems developer unpacked the tar file of abstracts and presentations into a directory named for the year of the symposium on unix. the perl script written for the annual mss loads (mkxml<year>. pl) was saved on unix and renamed mkxml2009.pl. the script was edited for 2009 (including the name of the csv file and the location of the directories for the unpacked files and generated xml). the csv headings used by the community in the new file were checked and verified against the extract list in the script. once the perl script was up-to-date and the base directory was created, the oit systems developer ran the perl script to generate the archive directory set for import. the import.sh script was then edited for 2009 and run to import the new symposium year into the staging instance of the knowledge bank as a quality check prior to loading into the live repository. the brief item view of an example mss 2009 item archived in the knowledge bank is shown in figure 3. figure 3. mss 2009 archived item example batch loading collections into dspace | walsh 123 proceedings of the 2003 international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed dec. 21, 2009). 3. r. mishra et al., “development of etd repository at iitk library using dspace,” in international conference on semantic web and digital libraries (icsd-2007), ed. a. r. d. prasad and devika p. madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed dec. 21, 2009). 4. todd m. mundle, “digital retrospective conversion of theses and dissertations: an in house project” (paper presented to the 8th international symposium on electronic theses & dissertations, sydney, australia, sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080mundle.pdf (accessed dec. 21, 2009). 5. rowan brownlee, “research data and repository metadata: policy and technical issues at the university of sydney library,” cataloging & classification quarterly 47, no. 3/4 (2009): 370–79. 6. steve thomas, “importing marc data into dspace,” 2006, http://hdl.handle.net/2440/14784 (accessed dec. 21, 2009). 7. sarah kim, lorraine a. dong, and megan durden, “automated batch archival processing: preserving arnold wesker’s digital manuscripts,” archival issues 30, no. 2 (2006): 91–106. 8. elspeth healey, samantha mueller, and sarah ticer, “the paul n. banks papers: archiving the electronic records of a digitally-adventurous conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/paul_banks_ final_report.pdf (accessed dec. 21, 2009); lisa schmidt, “preservation of a born digital literary genre: archiving legacy macintosh hypertext files in dspace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/mj%20wbo%20 capstone%20report.pdf (accessed dec. 21, 2009). 9. rachel e. proudfoot et al., “jisc final report: increase (increasing repository content through automation and services),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed dec. 21, 2009). 10. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed dec. 21, 2009). 11. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed dec. 21, 2009). 12. r. john robertson, “evaluation of metadata workflows for the glasgow eprints and dspace services,” 2006, http://hdl .handle.net/1905/615 (accessed dec. 21, 2009); william j. nixon and morag greig, “populating the glasgow eprints service: a mediated model and workflow,” 2005, http://hdl.handle .net/1905/387 (accessed dec. 21, 2009). 13. tim ribaric, “automatic preparation of etd material from the internet archive for the dspace repository platform,” code4lib journal no. 8 (nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed dec. 21, 2009). 14. randall floyd, “automated electronic thesis and dissertations ingest,” (mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01y (accessed dec. 21, 2009). 15. shawn averkamp and joanna lee, “repurposing probatch loading workflows are dependent upon the quality of data and metadata loaded. along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. the flexibility of perl allowed testing and revising to accommodate problems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the knowledge bank. however, toward the goal of standardizing batch loading workflows, the staff working with the knowledge bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. the efficiency of batch loading workflows is greatly enhanced by consistent data and basic standards for how metadata is supplied. batch loading is not only an extremely efficient means of populating an institutional repository, it is also a valueadded service that can increase buy-in from the wider campus community. it is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ acknowledgments i would like to thank conrad gratz, of osu oit, and andrew wang, formerly of osul. gratz wrote the shell scripts and the majority of the perl scripts used for automating the knowledge bank item import process and ran the corresponding batch loads. the early perl scripts used for batch loading into the knowledge bank, including the first phase of ojs and mss, were written by wang. parts of those early perl scripts written by wang were borrowed for subsequent scripts written by gratz. gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. i would also like to thank amanda j. wilson, a former metadata librarian for osul, who was instrumental to the success of many of the batch loading workflows created for the knowledge bank. references and notes 1. the ohio state university knowledge bank, “institutional repository policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed dec. 21, 2009). the knowledge bank homepage can be found at https://kb.osu.edu/dspace/ (accessed dec. 21, 2009). 2. margret branschofsky et al., “evolving metadata needs for an institutional repository: mit’s dspace,” 124 information technology and libraries | september 2010 appendix e. mss 2009 batch loading scripts -mkxml2009.pl -#!/usr/bin/perl use encode; # routines for utf encoding use text::xsv; # routines to process csv files. use file::basename; # open and read the comma separated metadata file. my $csv = new text::xsv; #$csv->set_sep(' '); # use for tab separated files. $csv->open_file("mss2009.csv"); $csv->read_header(); # process the csv column headers. # constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # this divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, appendixes a–d available at http://ital-ica.blogspot.com/ quest metadata for batch ingesting etds into an institutional repository,” code4lib journal no. 7 (june 26, 2009), http://journal .code4lib.org/articles/1647 (accessed dec. 21, 2009). 16. tim brody, registry of open access repositories (roar), http://roar.eprints.org/ (accessed dec. 21, 2009). 17. duraspace, dspace, http://www.dspace.org/ (accessed dec. 21, 2009). 18. dublin core metadata initiative libraries working group, “dc-library application profile (dc-lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed dec. 21, 2009). 19. the ohio state university knowledge bank policy committee, “osu knowledge bank metadata application profile,” http://library.osu.edu/sites/techservices/kbappprofile.php (accessed dec. 21, 2009). 20. ohio journal of science (ohio academy of science), knowledge bank community, http://hdl.handle .net/1811/686 (accessed dec. 21, 2009); osu international symposium on molecular spectroscopy, knowledge bank community, http://hdl.handle.net/1811/5850 (accessed dec. 21, 2009). 21. ohio journal of science (ohio academy of science), ohio journal of science: volume 74, issue 3 (may, 1974), knowledge bank collection, http://hdl.handle.net/1811/22017 (accessed dec. 21, 2009). batch loading collections into dspace | walsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "talk_id", "title", "creators", "abstract", "issuedate", "description", "authorinstitution", "image_file_name", "talk_gifs_file", "talk_ppt_file" ); $creatorxml = ""; # multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # create xml for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '<dcvalue element="creator" qualifier="none">' .$creator.’</dcvalue>’.”\n “; } } } # done processing creators for this item. # create the xml string for the abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\</</g; # build the abstract in xml. $abstractxml = '<dcvalue element="description" qualifier="abstract">' .$description_abstract.'</dcvalue>'; } # create the xml string for the description. $descriptionxml = ""; if (length($description) > 0) { # convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\</</g; # build the description in xml. $descriptionxml = '<dcvalue element="description" qualifier="none">' .$description.'</dcvalue>'; } appendix e. mss 2009 batch loading scripts (cont.) 126 information technology and libraries | september 2010 # create the xml string for the author institution. $description2xml = ""; if (length($description2) > 0) { # convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\</</g; # build the author institution xml. $description2xml = '<dcvalue element="description" qualifier="none">' .'author institution: '.$description2.'</dcvalue>'; } # convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\</</g; # create xml file $subdir = $xmldir."/".$linenum; system "mkdir $basedir/$subdir"; open(fh,">:encoding(utf-8)", "$basedir/$subdir/$filename"); print fh <<"xml"; <dublin_core> <dcvalue element="identifier" qualifier="none">$identifier</dcvalue> <dcvalue element="title" qualifier="none">$title</dcvalue> <dcvalue element="date" qualifier="issued">$issuedate</dcvalue> $abstractxml $descriptionxml $description2xml <dcvalue element="type" qualifier="none">article</dcvalue> <dcvalue element="language" qualifier="iso">en</dcvalue> $creatorxml </dublin_core> xml close($fh); # create contents file and move files to the load set. # copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # make the 'contents' file and fill it with the file names. appendix e. mss 2009 batch loading scripts (cont.) batch loading collections into dspace | walsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # sort items in reverse order so they show up right in dspace. # this is a hack that depends on how the db returns items # in unsorted (physical) order. there are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-za-z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # put the abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # done processing an item. --------------------------------------------------------------------------------------------------import.sh –#!/bin/sh # # import a collection from files generated on dspace # collection_id=1811/6635 eperson=[name removed]@osu.edu source_dir=./2009xml base_id=`basename $collection_id` mapfile=./map-dspace03-mss2009.$base_id /dspace/bin/dsrun org.dspace.app.itemimport.itemimport --add --eperson=$eperson --collection=$collection_id --source=$source_dir --mapfile=$mapfile appendix e. mss 2009 batch loading scripts (cont.) president’s message: open access/open data colleen cuddy information technologies and libraries | march 2012 1 i am very excited to write this column. this issue of information technology and libraries (ital) marks the beginning of a new era for the journal. ital is now an open-access, electronic-only journal. there are many people to thank for this transition. the lita publications committee led by kristen antelman did a thorough analysis of publishing options and presented a thoughtful proposal to the lita board; the lita board had the foresight to push for an open-access journal even if it might mean a temporary revenue loss for the division; bob gerrity, ital editor, has enthusiastically supported this transition and did the heavy lifting to make it happen; and the lita office staff worked tirelessly for the past year to help shepherd this project. i am proud to be leading the organization during this time. to see ital go open access in my presidential year is extremely gratifying. as cliff lynch notes in his editorial, “the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer.” as librarians challenge publishers to pursue open-access venues, myself included, i am relieved to no longer be a hypocrite. by supporting open access we are sending a strong message to the community that we believe in the benefits of open access and we encourage other library organizations to do the same. ital will now reach a much broader and larger audience. this will benefit our authors, the organization, and the scholarship of our profession. i understand that while our members embrace open access, not everyone is pleased with an online-only journal. the number of new journals being offered electronically only is growing and i believe we are beginning to see a decline in the dual publishing model of publishers and societies offering both print and online journals. my library has been cutting back consistently on print copies of journals and this year will get only a handful of journals in print. personally, i have embraced the electronic publishing world. in fact, i held off on subscribing to the new yorker until it had an ipad subscription model! i estimate that i read 95 percent of my books and all of my professional journals electronically. the revolution has happened for me and for many others. i know that our membership will adapt and transition their ital reading habits to our new electronic edition and i look forward to seeing this column and the entire journal in its new format. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 earlier this week saw the research works act die. librarians and researchers across the country celebrated this victory as we preserved an important open-access mandate requiring the deposition of research articles funded by the national institutes of health into pubmed central. this act threatened not just research but the availability of health information to patients and their families. as librarians, we still need to be vigilant about preserving open access and supporting open-access initiatives. i would like to draw your attention to the federal research public access act (frpaa, hr 4004). this act was recently introduced in the house, with a companion bill in the senate. as described by the association of research libraries, frppa would ensure free, timely, online access to the published results of research funded by eleven u.s. federal agencies. the bill gives individual agencies flexibility in choosing the location of the digital repository to house this content, as long as the repositories meet conditions for interoperability and public accessibility, and have provisions for long-term archiving. the legislation would extend and expand access to federally-funded research resources and, importantly, spur and accelerate scientific discovery. notably, this bill does not take anything away from publishers. no publisher will be forced to publish research under the bill’s provisions; any publisher can simply decline to publish the material if it feels the terms are too onerous. i encourage the library community to contact their representatives to support this bill. open access and open data are the keystones of e-science and its goals of accelerating scientific discovery. i hope that many of you will join me at the lita president’s program on june 24, 2012, in anaheim. tony hey, corporate vice president of microsoft research connections and former director of the u.k.'s e-science initiative, and clifford lynch, executive director of the coalition for networked information, will discuss data-intensive scientific discovery and its implications for libraries, drawing from the seminal work the fourth paradigm. librarians are beginning to explore our role in this new paradigm of providing access to and helping to manage data in addition to bibliographic resources. it is a timely topic and one in which librarians, due to our skill set, are poised to take a leadership role. reading the fourth paradigm was a real game changer for me. it is still extremely relevant. you might consider reading a chapter or two prior to the program. it is an open-access e-book available for download from microsoft research (http://research.microsoft.com/en-us/collaboration/fourthparadigm/). i keep a copy on my ipad, right there with downloaded ital article pdfs. http://www.arl.org/pp/access/frpaa-2012.shtml http://research.microsoft.com/en-us/collaboration/fourthparadigm/ extending im beyond the reference desk: a case study on the integration of chat reference and library-wide instant messaging network ian chan, pearl ly, and yvonne meulemans information technology and libraries | september 2012 4 abstract openfire is an open-source instant messaging (im) network and a single unified application that meets the needs of chat reference and internal communication. in fall 2009, the california state university san marcos (csusm) library began using openfire and other jive software im technologies to simultaneously improve our existing im-integrated chat reference software and implement an internal im network. this case study describes the chat reference and internal communications environment at the csusm library and the selection, implementation, and evaluation of openfire. in addition, the authors discuss the benefits of deploying an integrated im and chat reference network. introduction instant messaging (im) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. however, im can also offer a unique method of communication between library staff. librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. im provides another means of building relationships within the library organization and can improve teamwork. many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified application that uses the extensible messaging and presence protocol (xmpp), a widely adopted open protocol for im. since 2009, the california state university san marcos (csusm) kellogg library has used openfire for chat reference and internal im communication. openfire was relatively easy to set up and administer by the web development librarian. librarians and library users have found the im interface to be intuitive. in addition to helpful chat reference features such as statistics capture, queues, transfer, linking to meebo widgets, openfire offers the unique capability to host an internal im network within the library. ian chan (ichan@csusm.edu) is web development librarian, california state university san marcos, pearl ly (pmly@pasadena.edu) is access services & emerging technologies librarian, pasadena community college, pasadena, and yvonne meulemans (ymeulema@csusm.edu) is information literacy program coordinator, california state university san marcos, california. extending im beyond the reference desk | chan, ly, and meulemans 5 in this article, the authors present a literature review on im as a workplace communication tool and its successful use in libraries for chat reference services. a case study on the selection, implementation, and evaluation of openfire for use in chat reference and as an internal network will be discussed. in addition, survey results on the library staff use of the internal im network and its implications for collaboration and increased communication are shared. literature review although there is a great deal of literature on im for library reference services, publications on the use of im in libraries for internal communications do not appear in the professional literature. a review of library and information science (lis) literature has revealed very limited work on this aspect of instant messaging. however, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of im within organizations. instant messaging in the workplace in the workplace, im can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. it offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. within the academic library, im offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. research findings indicate that im allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (ftf), and phone conversations.1 each im conversation is designed to display as a single textual thread with one window per conversation. the contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. this design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 through the use of im, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 phone and ftf conversations are two of the most common forms of interruption within the workplace.4 however, garrett and danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 participants reported they were better able to manage disruptions using im and that im did not increase their communication time. the findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via im. this likely contributed to the reduced interruptions because im does not require full and immediate attention unlike a phone call or face-to-face communication. in addition, im study participants reported the ability to negotiate their availability through postponing conversations, information technology and libraries | september 2012 6 and these findings support earlier studies suggesting im is less intrusive than traditional communication methods for determining availability of coworkers.6 a number of research studies show that im improves teamwork and is useful for discussing complex tasks. huang, hung, and chen compared the effectiveness of email and im and the number of new ideas; they found that groups utilizing im generated more ideas than the email groups.7 they suggested that the spontaneous and rapid interchanges typical of im facilitates brainstorming between team members. the information that is uniquely visible through im and the ease of sending messages help create opportunities for spontaneous dialog. this is supported by a study by quan-haase, cothrel, and wellman, which found im promotes team interaction by indicating the likelihood of a faster response.8 ou et al. also suggest im has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 im can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. the informal and personalized nature of im allows workers to build relationships while promoting the sharing of information. cho, trier, and kim suggest that the use of im as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 im can build relationships between teams and organizations where members are in physically separated locations. however, cho, trier, and kim also note that im is more successful in building relationships between coworkers who already have an existing relationship. wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 several studies have cautioned that im, like other forms of communication, requires organizational guidelines on usage and best practices. mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide im use.12 other research indicates that personality, employee status, and working style may affect the usefulness of im for individual employees.13 some workers may find the multitasking nature of im to work in their favor while those who prefer sequential task completion may find im disruptive. the hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of im as well. while there are no research findings associated with the use of im for internal communication within libraries, there are articles encouraging its use. breeding writes of the potential for im to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 fink provides a concise introduction to the advantages of using internal im for communication between library staff.15 in addition, he provides an overview of the implementation and success of the openfire-based im network at mcmaster university. extending im beyond the reference desk | chan, ly, and meulemans 7 success of chat reference in libraries im-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” although libraries have used im for the last decade and many currently subscribe to questionpoint, a collaborative virtual reference service through oclc, two newer online services helped propel the growth of im-based chat reference. first available in 2006, the web-based meebo (www.meebo.com) made it much easier to use im for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as aol or yahoo, to communicate with librarians.16 instead, meebo provided web widgets that allowed users to chat via the web browser. libraries could easily embed these widgets throughout their website and unlike questionpoint, meebo is free and does not require a subscription. librarians could answer questions using either their account on meebo’s website or by logging-in with a locally installed instant messaging client. in comparison to im-based chat reference, a number of libraries also found questionpoint difficult to use due to its complexity and awkward interface.17 in 2008, libraryh3lp (http://libraryh3lp.com) pushed the growth of im-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. libraryh3lp improved on the meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 im adds a more informal means of interaction that helps librarians build relationships with their users. several recent studies have shown that users respond positively to the use of im for chat reference. the illinois state university milner library found that switching from its older chat reference software to im increased transactions by 161 percent within one year.19 with the introduction of web-based im widgets pennsylvania state university library’s im-based chat reference grew from 20 percent to 60 percent of all virtual reference (vr), which includes email reference, in one year.20 a 2010 study of vr and im service at the university of guelph library found 71 percent user satisfaction with im compared to 70 percent satisfaction with vr overall.21 im use in academic libraries has become ubiquitous, and other types of libraries also use im to communicate with library patrons. case study california state university, san marcos (csusm) is a mid-size public university with approximately 9,500 students. csusm is a commuter campus with the majority of students living in north county san diego and offers many online or distance courses at satellite campuses. the csusm kellogg library has a robust chat reference service that is used by students on and off campus. the library has about forty-five employees including librarians, library administrators, and library assistants. the following section will discuss the meebo chat reference pilot, selection of openfire to replace meebo, implementation and customization of openfire, and evaluation of openfire for chat reference by librarians and as an internal network for all library personnel. information technology and libraries | september 2012 8 meebo chat reference pilot to examine the feasibility of using im for chat reference at csusm, the reference librarians initiated a pilot program using meebo (2008–9). a meebo widget was placed on the library’s homepage, the ask a librarian page, and on library research guides. within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. chat reference is now an integral part of the library’s research assistance program, and im has become a permanent access point for students to contact reference librarians. although the new im service was successful, the pilot program uncovered a number of key shortcomings with meebo when used for chat reference; these shortcomings are documented in a case study by meulemans et al.23 these findings matched problems reported by other libraries who used meebo in their reference services.24 meebo is most suited for individual users who communicate one-to-one via im. for example, meebo chat widgets are specific to each meebo user, and it is not possible to share a single widget between multiple librarians. in addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in meebo. those features are essential for working with multiple, simultaneous incoming im messages, a common occurrence in virtual reference. other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 selecting openfire based on the need for a more robust chat reference system, the csusm reference librarians and the web development librarian explored other im options, especially open-source software. the web development librarian had previous experience using openfire at the university of alaska anchorage, for an internal library im network and investigated its capabilities to replace meebo as a chat reference tool. the desire to replace meebo for chat reference at csusm also provided the opportunity to pilot an internal im network. openfire, part of the suite of open-source instant messaging tools from jive software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other im-based chat reference systems. of its many features, one of the most valuable was the integration between openfire user accounts and our campus email system. being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. this removed the burden of individual account maintenance associated with external services such as meebo, libraryh3lp, and questionpoint. openfire supports internal im networks at educational institutions such as the university of pennsylvania, central michigan university, and university of california, san francisco. extending im beyond the reference desk | chan, ly, and meulemans 9 openfire could meet our im chat reference needs because it includes the fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. this robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. james cook university library in australia also chose to use openfire with fastpath plugin as its chat reference solution based on their need for those features.26 other institutions using fastpath and openfire in the role of chat reference or support include the university of texas, the oregon/ohio multistate virtual reference consortium, mozilla.com, and the university of wisconsin. when reviewing chat reference solutions, we considered the possibility of using chat modules available through drupal (http://drupal.org), the web content management system (cms) for our library website. the primary advantage of that option was complete integration with the library website and intranet. further analysis of the drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. in comparison to the implementation time associated with deploying the openfire system, using drupal-based chat modules did not provide a favorable cost-benefit ratio. while the proprietary libraryh3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to openfire. in libraryh3lp, it is necessary to create accounts for all library personnel in chat reference. fastpath does not have that requirement if you integrate openfire with your organization’s lightweight directory access protocol (ldap) directory. instead, the system will automatically create accounts for all library staff. furthermore, the administrative options and interface for libraryh3lp also did not compare favorably with that of fastpath. the fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). oregon’s l-net and ohio’s knowitnow24x7 offer information about software requirements and an online demonstration of spark/fastpath.27 information technology and libraries | september 2012 10 figure 1. fastpath chat initiation form for csusm research help desk figure 2. fastpath chat initiation form for csusm media library for our requirements, openfire was clearly superior to the available systems for chat reference. its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined im network and chat reference system. in the following section, we will discuss the installation, customization, and assessment of our openfire implementation. openfire installation and configuration the openfire application is a free download from ignite realtime, a community of jive software. the program will run on any web server that has a windows, linux, or macintosh operating system. if configured as a self-contained application, openfire only requires java to be available on your web server. installation of the software is an automated process and system configuration is through a web-based setup guide. after the initial language selection form, the next step in the server configuration process is to enter the web server url and the ports through which the server will communicate with the outside world (figure 3). the third step provides fields for selecting the type of database to use with openfire and for inputting any information relating to your selection (figure 4). extending im beyond the reference desk | chan, ly, and meulemans 11 figure 3. openfire server settings screen figure 4. openfire database configuration form openfire uses a database to store information such as im network settings, user account information, and transcripts. database options include using an embedded database or connecting to an external database server. using the embedded database is the simpler option and is helpful if you do not have access to a database server. connecting to an external database server offers more control of the data generated by openfire and provides additional backup options. openfire works with a number of the more commonly used database servers such as mysql, postgresql, and microsoft sql server. in addition, oracle and ibm’s db2 are database options with additional free plugins from these vendors. we choose to use mysql because of our experience using it with other library web applications. if using the external database option, creating and configuring the external database before installing openfire is highly recommended. after choosing a database, the openfire configuration requires the selection of an authentication method for user accounts. one option is to use openfire’s internal authentication system. while the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. the recommended option is to connect openfire with your organization’s lightweight directory access protocol (ldap) directory (figure 5). ldap is a protocol that allows external systems to interact with the user information stored in an organization’s email system. using ldap with openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. library staff simply login with their work email or network account information; they are not required to create a new username and password. information technology and libraries | september 2012 12 figure 5. openfire ldap configuration form the last step in the configuration process is to grant system administrator access to the appropriate users. if using the ldap authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). the selected users will have complete access to all aspects of the openfire server. once the setup and configuration process is complete, the server is ready to accept im connections and route messages. reviewing the settings and options within the openfire system administration area is highly recommended. most libraries will likely want to adjust the configurations within the sections for server settings and archives. connecting the im network the second phase of the implementation process connected our library personnel with the im network using im software installed on their workstations. the openfire im server works with any multiprotocol im client (“multiprotocol” refers to support for simultaneous connections to multiple im networks) that provides options for configuring an xmpp or jabber account. some of the more popular im clients that offer this functionality include spark, trillian, miranda, and pidgin. based on our chat reference requirements, we choose to use spark (www.igniterealtime.org/projects/spark), an im client program designed to work specifically with the fastpath web chat service. spark comes with a fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based fastpath chat widgets (more information on fastpath configuration is in the next section of this article). this plugin provides a tab for logging into a fastpath group and for viewing the status of the group’s message queues extending im beyond the reference desk | chan, ly, and meulemans 13 (figure 6). spark also includes many of the features offered by other im clients including built-in screen capture, message transfer, and group chat. figure 6. the fastpath plugin for spark library personnel were able to install spark on their own by downloading it from the ignite software website and launching the software’s installation package. the installation process is very simple and user-specific information is only required when spark is started for the first time. the fields required for login include the username and password of the user’s organizational email and the address of the im server. as part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their im experience. recommendations included auto-start of spark when loggingin to computer and the activation of incoming message signals, such as sound effects and pop-ups. on our openfire server, we had also installed the kraken gateway (http://kraken.blathersource.org) plugin to enable connections to external im networks. the gateway plugin works with spark to integrate library staff accounts on chat network such as google talk, facebook, and msn (an example of integrated networks is shown in figure 6.) by integrating meebo as well, librarians were able to continue using the meebo widgets they had embedded into their research guides and faculty profile pages. this allowed them to use spark to receive im messages rather than logging on to the meebo website. information technology and libraries | september 2012 14 configuring the fastpath plugin for chat reference a primary motivation for using openfire was the feature set available in the fastpath plugin. fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. fastpath actually consists of two plugins that work together, fastpath service for managing the chat system and fastpath webchat for web-based chat widgets. both plugins are available as free downloads from the openfire plugins section of the ignite software website— www.igniterealtime.org/projects/openfire/plugins.jsp. to install fastpath, upload the its packages using the form in the plugins section of the openfire administrative interface. the plugins will automatically install and add a fastpath tab to the administrative main menu. the first step in getting started with the system is to create a workgroup and add members (figure 7). within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” in fastpath, the term agent refers to those who will receive the incoming chat requests. figure 7. workgroup setup form in fastpath as work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. you may also configure the chat initiation form to require completion of some, all, or none of the fields. at csusm, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. the information in these fields allows us to quickly route incoming extending im beyond the reference desk | chan, ly, and meulemans 15 questions to the appropriate subject librarian. fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. in future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. there are two methods to make the fastpath chat widget available to the public. the standard approach embeds a presence icon on your webpage and provides automatic status updates. clicking on the icon displays the chat initiation form. for our needs we choose to embed the chat initiation form in our webpages (see appendix b for sample code). when the user submits the form, openfire routes the message to the next available librarian. on the librarian’s computer, the spark program plays a notification sound and displays a pop-up dialog. the pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. evaluation of openfire for enhanced chat reference the csusm reference librarians found fastpath and openfire to be much more robust than meebo for chat reference. the ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. the automated recording of transcripts and metadata saved time when compared to meebo. using meebo, transcripts were manually copied into a microsoft word document and the tracking statistics of im interactions were kept in a shared excel spreadsheet. other useful features of fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. furthermore, access to the database holding the fastpath data allowed us to build an intranet page to monitor real-time incoming im messages and their responses. however, some issues were encountered with the fastpath plugin when initiating chat connections. we experienced intermittent, random instances of dropped im connections and lost messages. while many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. to address these issues, we are now asking users to provide their email when they initiate a chat session. with user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience im connection issues and provide research assistance via email. evaluation of openfire as an internal communication tool while the adoption of im as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. based on the varied technical background of our staff and librarians, we recognized that some might find im difficult to integrate within their workflow or communication style and chose a soft-launch for our network. information technology and libraries | september 2012 16 in summer 2011, we conducted a survey of csusm library personnel (44 respondents, 99 percent of total staff) to evaluate im as an internal communication tool. (see appendix a for survey questions.) we found that 59 percent of staff use the internal im network while 85 percent use some type of im for web-based chat for work. of those who use internal im, 30 percent used it daily. while the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as library systems and the information literacy program/reference. among the respondents who use im, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. twenty percent of total respondents preferred im to email and phone communications. two respondents use the internal im network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. an additional survey question was geared for staff members who do not use the internal im network at all (“why do you not use the library im network?”). this question was designed to find areas of possible improvement within our system to encourage greater use. survey respondents were allowed to select more than one reason. the most common reasons given by those who do not use the library im network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the im network (18 percent), im does not work for their communication style (14 percent), and privacy concerns (14 percent). we believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of im within our organization and to further its adoption. conclusion through additional training and user education, we hope to promote greater use of the openfire internal im network among those who work in the library. while 100 percent adoption of im as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of im for collaboration and productivity due to a lack of experience with this technology. in hindsight, additional training sessions beyond the initial introductory workshop to set up the spark im client may have increased the usage of im by staff. for example, providing more information on the library’s policies regarding internal im tracking and the configuration of our system may have alleviated concerns regarding privacy. in addition, we need to lead more discussions on the benefits of im for collaboration, lowering disruptions, and increasing effectiveness in the workplace. openfire and fastpath for chat reference has brought many new features that were previously unavailable to chat reference at csusm. the addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. compared to the prior chat reference implementations that used questionpoint and meebo, this new system is more user friendly and robust. extending im beyond the reference desk | chan, ly, and meulemans 17 furthermore, the internal im network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. library users could feasibly contact any library staff member, not just reference librarians, via im for help. we are testing this concept with a pilot project involving the csusm media library. they are staffing their own chat workgroup and a chat widget is now available on their website. in the future, we also hope to employ a chat widget for circulation and ill services, another public services area that frequently works with library users. it is important to note that the success of openfire and im in the library attracted the attention of other csusm instructional and student support areas. in spring 2011, instructional and information technology services (iits), which provides campus-wide technology services for faculty, staff, and students piloted an openfire-based im helpdesk service to assist users with technology questions and problems. as of fall 2011, the “ask an it technician” service is fully implemented and available on all campus webpages. discussions on the adoption of im for other campus student services, such as financial aid and counseling, have also occurred. in addition to being a contact point for students, im has potential to improve the internal communication within the organization. references 1. hee-kyung cho, matthias trier, and eunhee kim, “the use of instant messaging in working relationship development: a case study,” journal of computer-mediated communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed aug. 1, 2011). 2. bonnie a. nardi, steven whittaker, and erin bradner, “interaction and outeraction: instant messaging in action,” in proceedings of the 2000 acm conference on computer supported cooperative work (new york, new york: acm press, 2000),79–88. 3. ellen isaacs et al., “the character, functions, and styles of instant messaging in the workplace,” in proceedings of the 2002 acm conference on computer supported cooperative work (new york, new york: acm press, 2002), 11–20. 4. victor m. gonzález and gloria mark, “constant, constant, multi-tasking craziness: managing multiple working spheres,” in proceedings of the sigchi conference on human factors in computing systems (new york, new york: acm press, 2004), 113–20. 5. r. kelly garrett and james n. danziger, “im = interruption management? instant messaging and disruption in the workplace,” journal of computer-mediated communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed jun. 15, 2011). 6. nardi, whittaker, and bradner, “interaction and outeraction,” 83. 7. albert h. huang, shin-yuan hung, and david c. yen, “an exploratory investigation of two internet-based communication modes,” computer standards & interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html information technology and libraries | september 2012 18 8. anabel quan-haase, joseph cothrel, and barry wellman, “instant messaging for collaboration: a case study of a high-tech firm,” journal of computer-mediated communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed jun. 12, 2011). 9. carol x. j. ou et al., “empowering employees through instant messaging,” information technology & people 23, no. 2 (2010): 193–211. 10. cho, trier, and kim, “instant messaging in working relationship development.” 11. lynn wu et al., “value of social network—a large-scale analysis on network structure impact to financial revenue of information technology consultants” (paper presented at winter information systems conference, salt lake city, ut, feb. 5, 2009). 12. pruthikrai mahatanankoon, “28p. exploring the impact of instant messaging on job satisfaction and creativity,” conf-irm 2010 proceedings (2010). 13. ashish gupta and han li, “understanding the impact of instant messaging (im) on subjective task complexity and user satisfaction,” in pacis 2009 proceedings. paper 10, http://aisel.aisnet.org/pacis2009/1; and stephanie l. woerner, joanne yates, and wanda j. orlikowski, “conversational coherence in instant messaging and getting work done,” in proceedings of the 40th annual hawaii international conference on system sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 (2007). 14. marshall breeding, “instant messaging: it’s not just for kids anymore,” computers in libraries 23, no. 10 (2003): 38–40. 15. john fink, “using a local chat server in your library,” feliciter 56, no. 5 (2010): 202–3. 16. william breitbach, matthew mallard, and robert sage, “using meebo’s embedded im for academic reference services: a case study,” reference services review 37, no. 1 (2009): 83–98. 17. cathy carpenter and crystal renfro, “twelve years of online reference services at georgia tech: where we have been and where we are going,” georgia library quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed aug. 25, 2011); and danielle theiss-white et al., “im’ing overload: libraryh3lp to the rescue,” library hi tech news 26, no. 1/2 (2009): 12–17. 18. theiss-white et al., “im’ing overload,” 12–17. 19. sharon naylor, “why isn’t our chat reference used more?” reference & user services quarterly 47, no. 4 (2008): 342–54 20. sam stormont, “becoming embedded: incorporating instant messaging and the ongoing evolution of a virtual reference service,” public services quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 extending im beyond the reference desk | chan, ly, and meulemans 19 21. lorna rourke and pascal lupien, “learning from chatting: how our virtual reference questions are giving us answers,” evidence based library & information practice 5, no. 2 (2010): 63–74. 22. pearl ly and allison carr, “do u im?: using evidence to inform decisions about instant messaging in library reference services” (poster presented at the 5th evidence based library and information practice conference, stockholm, sweden, june 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed august 1, 2011). 23. yvonne nalani meulemans, allison carr, and pearl ly, “from a distance: robust reference service via instant messaging,” journal of library & information services in distance learning 4, no. 1 (2010): 3–17. 24. theiss-white et al., “im’ing overload,” 12–17. 25. meulemans, carr, and ly, “from a distance,” 14–15 26. nicole johnston, “improving the reference and information experience of students in regional areas—does an instant messaging service make a difference?” (paper presented at 4th alia new librarians symposium, december 5–6, 2008, melbourne, australia), http://eprints.jcu.edu.au/2076(accessed august 17, 2011); and alan cockerill, “open source for im reference: openfire, fastpath and spark” (workshop presented at fair shake of the open source bottle, griffith university, queensland college of art, brisbane, australia, november 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed august 4, 2011). 27. oregon state multistate collaboration, “multi-state collaboration: home,” http://www.oregonlibraries.net/multi-state (accessed august 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state information technology and libraries | september 2012 20 appendix a library instant messaging (im) usage survey the information you submit is confidential. your name and campus id are not included with your response. which of the following do you use . . . for work for personal library’s im network (spark) meebo msn yahoo gtalk facebook or other website-specific chat system im app on my phone trillian, pidgin or other im aggregator skype i don’t use im or web-based chat other if you selected other, please describe: ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 21 on average, how often do you communicate via im or web-based chat at work? ● several times a day ● almost daily ● several times a week ● several times a month ● never how often do you use im or web-based chat to . . . 5—often 4 3— sometimes 2 1—never discuss work-related topic socialize with co-worker answer questions from library users talk about non-work related topic request tech support other if you selected other, please describe: ____________________________________________________________________ if you use im to communicate at work, what do you like about it? ● allows for quick communication with others in the library ● facilitates informal conversation ● students like to use it to ask library related questions ● i prefer im over phone or email ● other: information technology and libraries | september 2012 22 why do you not use the library im network? ● don’t feel the need ● the people i usually talk to aren’t on it ● does not work well ● never get around to it . . . but would like to ● it doesn’t work for my communication style ● the system is too difficult to use ● privacy concerns ● other: additional comments? ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 23 appendix b iframe code for embedding fastpath chat widget <iframe scrolling= “no” frameborder= “0” src=“http://library.your_org.edu: 7070/webchat/userinfo.jsp?workgroup=<workgroupname>@workgroup.library.your_org.edu”> your browser does not support inline frames or is currently configured not to display inline frames. content can be viewed at actual source page: http://library.your_org.edu: 7070/webchat/userinfo.jsp?workgroup=<workgroupname>@workgroup.library.your_org.edu </iframe> many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified ap... literature review instant messaging in the workplace success of chat reference in libraries case study meebo chat reference pilot selecting openfire openfire installation and configuration connecting the im network configuring the fastpath plugin for chat reference evaluation of openfire for enhanced chat reference evaluation of openfire as an internal communication tool conclusion references appendix a library instant messaging (im) usage survey appendix b from our readers | eden 93 bradford lee edenfrom our readers the new user environment: the end of technical services? editor’s note: “from our readers” is an occasional feature highlighting ital readers’ letters and commentaries on timely issues. technical services: an obsolete term used to describe the largest component of most library staffs in the twentieth century. that component of the staff was entirely devoted to arcane and mysterious processes involved in selecting, acquiring, cataloging, processing, and otherwise making available to library users physical material containing information content pieces (incops). the processes were complicated, expensive, and time-consuming, and generally served to severely limit direct service to users both by producing records that were difficult to understand and interpret, even by other library staff, and by consuming from 75–80 percent of the library’s financial and personnel resources. in the twenty-first century, the advent of new forms of publication and new techniques for providing universal records and universal access to information content made the organizational structure obsolete. that change in organizational structure, more than any other single factor, is generally credited as being responsible for the dramatic improvement in the quality of library service that has occurred in the first decade of the twenty-first century. t here are many who would say that i was the one who wrote this quotation. i didn’t, and it is, in fact, more than twenty-five years old!1 while i was beginning to research and prepare for this article, i began as most users today start their search for information: i started with google. granted, i rarely go beyond the first page of results (as most user surveys indicate), but the paucity of links made me click to the next screen. there, at number 16, was a scanned article. jackpot! i thought as i started perusing the contents of this resource online, thinking to myself how the future had changed so dramatically since 1984, with the emergence of the internet and the laptop, all of the new information formats, and the digitization of information. ahh, the power of full text! after reading through the table of contents, introduction, and the first chapter, i noticed that some of the pages were missing. mmmm, obviously some very shoddy scanning on the part of google. but no, i finally realized that only part of this special issue was available on google. obviously, i missed the statement at the bottom of the front scan of the book: “this is a preview. the total pages displayed will be limited. learn more.” and thus the issues regarding copyright reared their ugly head. when discussing the new user environment, there are many demands facing libraries today. in a report by martha bates, citing the principle of least effort first attributed to philologist george zipf and quoted in the calhoun report to the library of congress, she states: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find . . . despite heroic efforts on the part of librarians, students seldom have sufficiently sustained exposure to and practice with library skills to reach the point where they feel real ease with and mastery of library information systems.2 according to the final report of bibliographic services task force of the university of california libraries, users expect the following: ■■ one system or search to cover a wide information universe (e.g., google or amazon) ■■ enriched metadata (e.g., onix, tables of contents, and cover art) ■■ full-text availability ■■ to move easily and seamlessly from a citation about an item to the item itself—discovery alone is not enough ■■ systems to provide a lot of intelligent assistance ■❏ correction of obvious spelling errors ■❏ results sorting in order of relevance to their queries ■❏ help in navigating large retrievals through logical subsetting or topical maps or hierarchies ■❏ help in selecting the best source through relevance ranking or added commentary from peers and experts or “others who used this also used that” tools ■❏ customization and personalization services ■■ authenticated single sign-on ■■ security and privacy ■■ communication and collaboration ■■ multiple formats available: e-books, mpeg, jpeg, rss and other push technologies, along with traditional, tangible formats ■■ direct links to e-mail, instant messaging, and sharing ■■ access to online virtual communities ■■ access to what the library has to offer without actually having to visit the library3 bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services & scholarly communication, university of california, santa barbara. 94 information technology and libraries | june 2010 what is there in this new user environment for those who work in technical services? as indicated in the opening quote, would a dramatic improvement in library services occur if technical services were removed from the organizational structure? even in 1983, the huge financial investment that libraries made in the organization and description of information, inventory, workflows, and personnel was recognized; today, that investment comes under intense scrutiny as libraries realize that we no longer have a monopoly on information access, and to survive we need to move forward more aggressively into the digital environment than ever before. as marcum stated in her now-famous article, ■■ if the commonly available books and journals are accessible online, should we consider the search engines the primary means of access to them? ■■ massive digitization radically changes the nature of local libraries. does it make sense to devote local efforts to the cataloging of unique materials only rather than the regular books and journals? ■■ we have introduced our cataloging rules and the marc format to libraries all over the world. how do we make massive changes without creating chaos? ■■ and finally, a more specific question: should we proceed with aacr3 in light of a much-changed environment?4 there are larger internal issues to consider here as well. the budget situation in libraries requires the application of business models to workflows that have normally not been questioned nor challenged. karen calhoun discusses this topic in a number of her contributions to the literature: when catalog librarians identify what they contribute to their communities with their methods (the cataloging rules, etc.) and with the product they provide (the catalog), they face the danger of “marketing myopia.” marketing myopia is a term used in the business literature to describe a nearsighted view that focuses on the products and services that a firm provides, rather than the needs those products and services are intended to address.5 for understanding the implementation issues associated with the leadership strategy, it is important to be clear about what is meant by the “excess capacity” of catalogs. most catalogers would deny there is excess capacity in today’s cataloging departments, and they are correct. library materials continue to flood into acquisitions and cataloging departments and staff can barely keep up. yet the key problem of today’s online catalog is the effect of declining demand. in healthy businesses, the demand for a product and the capacity to produce it are in balance. research libraries invest huge sums in the infrastructure that produces their local catalogs, but search engines are students and scholars’ favorite place to begin a search. more users bypass catalogs for search engines, but research libraries’ investment in catalogs—and in the collections they describe—does not reflect the shift in user demand.6 i have discussed this exact problem in recent articles and technical reports as well.7 there have to be better, more efficient ways for libraries to organize and describe information not based on the status quo of redundant “localizing” of bibliographic records. a good analogy would be the current price of gas and the looming transportation crisis. for many years, americans have had the luxury of being able to purchase just about any type of car, truck, suv, hummer, etc., that they wanted on the basis of their own preferences, personalities, and incomes, not on the size of the gas tank or on the mileage per gallon. why not buy a mercedes over a kia? but with gas prices now well above the average person’s ability to consistently fill their gas tank without mortgaging their future, the market demands that people find alternative solutions in order to survive. this has meant moving away from the status quo of personal choice and selection toward a more economic and sustainable model of informed fuel-efficiency transportation, so much so that public transportation is now inundated with more users than it can handle, and consumers have all but abandoned the truck and suv markets. libraries have long worked in the mercedes arena, providing features such as authority control, subject classification, and redundant localizing of bibliographic records that were essential when libraries held the monopoly on information access but are no longer cost-efficient—nor even sane—strategies in the current information marketplace. users are not accessing the opac anymore; well-known studies indicate that more than 80 percent of information seekers begin their search on a web search engine. libraries are investing huge resources in staffing and priorities fiddling with marc bibliographic records in a time when they are struggling to survive and adapt from a monopoly environment to being just one of many players in the new information marketplace. budgets are stagnant, staffing is at an all-time low, new information formats continue to appear and require attention, and users are no longer patient nor comfortable working with our clunky opacs.8 why do libraries continue to support an infrastructure of buying and offering the same books, cds, dvds, journals, etc., at every library, when the new information environment offers libraries the opportunity to showcase and present their unique information resources and one-of-a-kind collections to the world? special collections materials held by every major research and public library in the world can now be digitized, and from our readers | eden 95 sparse library resources need to be adjusted to compete and offer these unique collections and their services to our users and the world. the october 2007 issue of computers in libraries is devoted solely to articles related to the enhancement, usability, appropriateness, and demise of the library opac. interesting articles include “fac-back-opac: an open source solution interface to your library system,” “dreaming of a better ils,” “plug your users into library resources with opensearch plug-ins,” delivering what people need, when and where they need it,” “the birth of a new generation of library interfaces,” and “will the ils soon be as obsolete as the card catalog?” an especially interesting quote is given by cervone, then assistant university librarian for information technology at northwestern university: what i’d like to see is for the catalog to go away. to a great degree, it is an anachronism. what we need from the ils is a solid, business-process back end that would facilitate the functions of the library that are truly unique such as circulation, acquiring materials, and “cataloging” at the item level for what amounts to inventory-control purposes. most of the other traditional ils functions could be rolled over into a centralized system, like oclc, that would be cooperatively shared. the catalog itself should be treated as just another database in the world of resources we have access to. a single interface to those resources that would combine our local print holdings, electronic text (both journal and ebook), as well as multimedia material is what we should be demanding from our vendors.9 one book that needs to be required reading for all librarians, especially catalogers, is weinberger ’s everything is miscellaneous.10 he describes the three orders of order (self organization, metadata, and digital); provides an extensive history of how western civilization has ordered information, specifically the links to nineteenth-century victorianism; and the concepts of lumping and splitting. in the end, weinberger argues that the digital environment allows users to manipulate information into their own organization system, disregarding all previous organizational attempts by supposed experts using outdated and outmoded systems. in the digital disorder of information, an object (leaf) can now be placed on many shelves (branches), figuratively speaking, and this new shape of knowledge brings out four strategic principles: 1. filter on the way out, not on the way in. 2. put each leaf on as many branches as possible. 3. everything is metadata and everything can be a label. 4. give up control. it is this last principle that libraries have challenges with. whether we agree with this principle or not, it has already happened. arguing about it, ignoring it, or just continuing to do business as usual isn’t going to change the fact that information is user-controled and user initiated in the digital environment. so, where do we go from here? the future of technical services (and its staff) far be it from me to try to predict the future of libraries as viable, and more importantly marketable, information organizations in this new environment. one has only to examine the quotations from the first issues of technical services quarterly to see what happens to predictions and opinions. titles of some of the contributions (from 1983, mind you) are worthy of mention: “library automation in the year 2000,” “musings on the future of the catalog,” and “libraries on the line.” there are developments, however, that require reexamination and strategic brainstorming regarding the future of library bibliographic organization and description. the appearance of worldcat local will have a tremendous impact on the disappearance of proprietary vendor opacs. there will no longer be a need for an integrated library system (ils); with worldcat local, the majority of the world’s marc bibliographic records are available in a library 2.0 format. the only things missing are some type of inventory and acquisitions module that can be formatted locally and a circulation module. if oclc could focus their programming efforts on these two services and integrate them into worldcat local, library administrators and systems staff would no longer have to deal with proprietary and clunky opacs (and their huge budgetary lines), but could use the power of web 2.0 (and hopefully 3.0) tools and services to better position themselves in the new information marketplace. another major development is the google digitization project (and other associated ventures). while there are some concerns about quality and copyright,11 as well as issues related to the disappearance of print and the time involved to digitize all print,12 no one can deny the gradual and inevitable effect that mass digitization of print resources will have in the new information marketplace. just the fact that my research explorations for this article brought up digitized portions of the 1983 technical services quarterly articles is an example. more and more, published print information will be available in full-text online. what effect will this have on the physical collection that all libraries maintain, not only in terms of circulation, but also in terms of use of space, preservation, and collection development? no one knows for sure, but if the search strategies and information discovery patterns of our users are any 96 information technology and libraries | june 2010 indication, then we need to be strategically preparing and developing directions and options. automatic metadata generation has been a topic of discussion for a number of years, and jane greenberg’s work at the university of north carolina–chapel hill is one of the leading examples of research in this area.13 while there are still viable concerns about metadata generation without any type of human intervention, semiautomatic and even nonlibrary-facilitated metadata generation has been successful in a number of venues. as libraries grapple with decreased budgets, multiplying formats, fewer staff to do the work, and more retraining and reprofessional development of existing staff, library administrators have to examine all options to maximize personnel as well as budgetary resources. incorporating new technologies and tools for generating metadata without human intervention into library workflows should be viewed as a viable option. user tagging would be included in this area. even intner, a long-time proponent of traditional technical services, has written that generating cataloging data automatically would be of great benefit to the profession, and that more tools and more programming ought to be focused toward this goal.14 so, with print workflows being replaced by digital and electronic workflows, how can administrators assist their technical services staff to remain viable in this new information environment? how can technical services staff not only help themselves but their supervisors and administrators to incorporate their unique talents, expertise, education, and experience toward the type of future scenarios indicated above? competencies and challenges for technical services staff there are some good opinions available for assisting technical services staff with moving into the new environment. names have power, whether we like to admit it or not, and changing the name from “technical services” to something more understandable to our users, let alone our colleagues within the library, is one way to start. names such as “collections and data management services” or “reference data services” have been mentioned.15 an interesting quote sums up the dilemma: it’s pretty clear that technical services departments have long been the ugly ducklings in the library pond, trumped by a quintet of swans: reference departments (the ones with answers for a grateful public); it departments (the magicians who keep the computers humming); children’s and youth departments (the warm and fuzzy nurturers); other specialty departments (the experts in good reads, music, art, law, business, medicine, government documents, av, rare books and manuscripts, you-name-it); and administrative groups (the big bosses). part of the trouble is that the rest of our colleagues don’t really know what technical services librarians do. they only know that we do it behind closed doors and talk about it in language no one else understands. if it can’t be seen, can’t be understood, and can’t be discussed, maybe it’s all smoke and mirrors, lacking real substance. it’s easy to ignore.16 ruschoff mentions competencies for technical services librarians in the new information environment: comfortable working in both print and digital worlds, specialized skills such as foreign languages and subject area expertise, comfortable working in both digital and web-based technologies (suggesting more computing and technology skills), expertise in digital asset management, and problem-solving analytical skills.17 in a recent blog posting summarizing a presentation at the 2008 ala annual conference on this topic, comparisons between catalogers going extinct or retooling are provided. the following is a summary of that post: converging trends ■■ more catalogers work at the support-staff level than as professional librarians. ■■ more cataloging records are selected by machines. ■■ more catalog records are being captured from publisher data or other sources. ■■ more updating of catalog records is done via batch processes. ■■ libraries continue to deemphasize processing of secondary research products in favor of unique primary materials. what are our choices? ■■ behind door number one—the extinction model. ■■ behind door number two—the retooling model. how it’s done ■■ extinction ■❏ keep cranking about how nobody appreciates us. ■❏ assert over and over that we’re already doing everything right—why should we change? ■❏ adopt a “chicken little” approach to envisioning the future. ■■ retooling ■❏ considers what catalogers already do. ■❏ look for support. ■❏ find a new job. what catalogers do ■■ operate within the boundaries of detailed standards. ■■ describe items one-at-a-time. ■■ treat items as if they are intended to fit carefully from our readers | eden 97 within a specific application—the catalog. ■■ ignore the rest of the world of information. what metadata librarians do ■■ think about descriptive data without preconceptions around descriptive level, granularity, or descriptive vocabularies. ■■ consider the entirety of the discovery and access issues around a set or collection of materials. ■■ consider users and uses beyond an individual service when making design decisions—not necessarily predetermined. ■■ leap tall buildings in a single bound. what new metadata librarians do ■■ be aware of changing user needs. ■■ understand the evolving information environment. ■■ work collaboratively with technical staff. ■■ be familiar with all metadata formats and encoding metadata. ■■ seek out tall buildings—otherwise jumping skills will atrophy. the cataloger skill set ■■ aacr2, lc, etc. the metadata librarian skill set ■■ views data as collections, sets, streams. ■■ approaches the task as designing data to “play well with others.” characteristics of our new world ■■ no more ils ■■ bibliographic utilities are unlikely to be the central node for all data. ■■ creation of metadata will become more decentralized. ■■ nobody knows how this will all shake out, but metadata librarians will be critical in forging solutions.18 while the above summary focuses on catalogers and their future, many of the directions also apply to any librarian or support staff member currently working in technical services. in a recent educause review article, brantley lists a number of mantras that all libraries need to repeat and keep in mind in this new information environment: ■■ libraries must be available everywhere. ■■ libraries must be designed to get better through use. ■■ libraries must be portable. ■■ libraries must know where they are. ■■ libraries must tell stories. ■■ libraries must help people learn. ■■ libraries must be tools of change. ■■ libraries must offer paths for exploration. ■■ libraries must help forge memory. ■■ libraries must speak for people. ■■ libraries must study the art of war.19 you will have to read the article to find out about that last point. the above mantras illustrate that each of these issues must also be aligned with the work done by technical services departments in support of the rest of the library’s services. and there definitely isn’t one right way to move forward; each library with its unique blend of services and staff has to define, initiate, and engender dialogue on change and strategic direction, and then actively make decisions with integrity and vigor toward both its users and its staff. as calhoun indicates, there are a number of challenges to feasibility for next steps in this area, some technically oriented but many based on our own organizational structures and strictures: ■■ difficulty achieving consensus on standardized, simplified, more automated workflows. ■■ unwillingness or inability to dispense with highly customized acquisitions and cataloging operations. ■■ overcoming the “not invented here” mindset preventing ready acceptance of cataloging copy from other libraries or external sources. ■■ resistance to simplifying cataloging. ■■ inability to find and successfully collaborate with necessary partners (e.g., ils vendors). ■■ difficulty achieving basic levels of system interoperability. ■■ slow development and implementation of necessary standards. ■■ library-centric decision making; inability to base priorities on how users behave and what they want ■■ limited availability of data to support management decisions. ■■ inadequate skill set among library staff; unwillingness or inability to retrain. ■■ resistance to change from faculty members, deans, or administrators.20 moving forward in the new information world in a recent discussion on the autocat electronic discussion list regarding the client-business paradigm now being impressed on library staff, an especially interesting quote puts the entire debate into perspective: the irony of this discussion is that our patrons/users/ clients [et al.] expect to be treated as well as business customers. they pay tuition or taxes to most of our institutions and expect to have a return in value. and a very large percentage of them care about the differences between the government services vs. business 98 information technology and libraries | june 2010 arguments we present. what they know is that when they want something, they want it. more library powers-that-be now come from the world of business rather than libraries because of the pressure on the bottom line. business administrators are viewed, even by those in public administration, as being more fiscally able than librarians. i would recommend that we fuss less about titles and semantics and develop ways to show the value of libraries to the public.21 wheeler, in a recent educause review article, documents a number of “eras” that colleges and universities have gone through in recent history.22 first is the “era of publishing,” followed by the “era of participation” with the appearance of the internet and its social networking tools. the next era, the “era of certitude,” is one in which users will want quick, timely answers to questions, along with some thought about the need and context of the question. wheeler espouses five dimensions that tools of certitude must have: reach, response, results, resources, and rights. he explains these dimensions in regards to various tools and services that libraries can provide through human–human, human–machine, and machine–machine interaction.23 wheeler sees extensive rethinking and reengineering by libraries, campuses, and information technology to assist users to meet their information needs. are there ways that technical services staff can assist in these efforts? although somewhat dated, calhoun’s extensive article on what is needed from catalogers and librarians in the twenty-first century expounds a number of salient points.24 in table 1, she illustrates some of the many challenges facing traditional library cataloging, providing her opinion on what the challenges are, why they exist, and some solutions for survivability and adaptability in the new marketplace.25 one quote in particular deserves attention: at the very least, adapting successfully to current demands will require new competencies for librarians, and i have made the case elsewhere that librarians must move beyond basic computer literacy to “it fluency”—that is, an understanding of the concepts of information technology, especially applying problem solving and critical thinking skills to using information technology. raising the bar of it fluency will be even more critical for metadata specialists, as they shift away from a focus on metadata production to approaches based on it tools and techniques on the one hand, and on consulting and teamwork on the other. as a result of the increasing need for it fluency among metadata specialists, they may become more closely allied with technical support groups in campus computing centers. the chief challenges for metadata specialists will be getting out of library back rooms, becoming familiar with the larger world of university knowledge communities, and developing primary contacts with the appropriate domain experts and it specialists.26 getting out of the back room and interacting with users seems to be one of the dominant themes of evolving technical services positions to fit the new information marketplace. putting web 2.0 tools and services into the library opac has also gained some momentum since the launch of the endeca-based opac at north carolina state university. as some people have stated, however, putting “lipstick on a pig” doesn’t change the fundamental problems and poor usability of something that never worked well in the first place.27 in their recent article, jia mi and cathy weng tried to answer the following questions: why is the current opac ineffective? what can libraries and librarians do to deliver an opac that is as good as search engines to better serve our users?28 of course, the authors are biased toward the opac and wish to make it better, given that the last sentence in their abstract is, “revitalizing the opac is one of the pressing issues that has to be accomplished.” users’ search patterns have already moved away from the opac as a discovery tool; why should personnel and resource investment continue to be allocated toward something that users have turned away from? in their recommendations, mi and weng indicate that system limitations, not fully exploiting the functionality already made available by ilss, and the unsuitability of marc standards to online bibliographic display are the primary factors to the ineffectiveness of library opacs. exactly. debate and discussion on autocat after the publication of their article again shows the line drawn between conservative opinions (added value, noncommercialization, and overall ideals of the library profession and professional cataloging workflows) and the newer push for open-source models, junking the opac, and learning and working with non-marc metadata standards and tools. conclusion from an administrative point of view, there are a number of viable options for making technical services as efficient as possible, in its current emanation: ■■ conduct a process review of all current workflows, following each type of format from receipt at loading dock to access by user. revise and redesign workflows for efficiency. ■■ eliminate all backlogs, incorporating and standardizing various types of bibliographic organization (from brief records to full records, using established criteria of importance and access). ■■ as much as possible, contract with vendors to make from our readers | eden 99 all print materials shelf-ready, establishing and monitoring profiles for quality and accuracy. establish a rate of error that is amenable to technical services staff; once that error rate is met, review incoming print materials only once or twice a year. ■■ assure technical services staff that their skills, experience, and attention to detail are needed in the electronic environment, and provide training and professional development to assist them in scanning and digitizing unique collections, learning non-marc metadata standards, improving project management, and performing consultation training to interact with faculty and students who work with data sets, metadata, and research planning. support and actively work for revised job reclassification of library support staff positions. most libraries are forced to work with fewer staff, and it is essential that current personnel are valued for their institutional knowledge and skill sets (knowledge management philosophy). library administrations need to emphasize to their staff that the organization has a vested interest in providing them with the tools and training they need to assist the organization in the new information marketplace. the status quo of technical services operations is no longer viable or cost-effective; all of us must look at ways to regain market share and restructure our organizations to collaborate and consult with users regarding their information and research needs. no longer is it enough to just provide access to information; we must also provide tools and assistance to the user in manipulating that information. to end, i would like to quote from a few of the articles from that 1983 issue of technical services quarterly i have alluded to throughout this chapter: like all prognostications, predictions about cataloging in a fully automated library may bear little resemblance to the ultimate reality. while the future cataloging scenario discussed here may seem reasonable now, it could prove embarrassing to read 10–20 years hence. still, i would be pleasantly surprised if, by the year 2000, ts operations are not fully integrated, ts staff has not been greatly reduced, there has not been a large-scale jump in ts productivity accompanied by a dramatic decline in ts costs, and if most of us are not cooperating through a national database.29 in conclusion, i will revert to my first subject, the uncertain nature of predictions. in addition to the fearless predictions already recorded, i predict that some of these predictions will come true and perhaps even most of them. some of them will come true, but not in the time anticipated, while others never will. let us hope that the influences not guessed that will prevent the actualization of some of these predictions will be happy ones, not dire. however they turn out, i predict that in ten years no one will remember or really care what these predictions were.30 technical services as we know them now may well not exist by the end of the century. the aims of technical services will exist for as long as there are libraries. the technical services quarterly may well have changed its name and its coverage long before then, but its concerns will remain real and the work to which many of us devote our lives will remain worthwhile. there can be few things in life that are as worth doing as enabling libraries to fulfill their unique and uniquely important role in culture and civilization.31 twenty-five years have come and gone; some of the predictions in this first issue of technical services quarterly came true, many of them did not. there have been dramatic changes in those twenty-five years, most of which were unforeseen, as they always are. what is a certainty is that libraries can no longer sustain or maintain the status quo in technical services. what also is a certainty is that technical services staff, with their unique skills, talents, abilities, and knowledge in relation to the organization and description of information, are desperately needed in the new information environment. it is the responsibility of both library administrators and technical services staff to work together to evolve and redesign workflows, standards, procedures, and even themselves to survive and succeed into the future. references 1. norman d. stevens, “selections from a dictionary of libinfosci terms,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 260. 2. marcia j. bates, “improving user access to library catalog and portal information: final report,” (paper presented at the library of congress bicentennial conference on bibliographic control for the new millennium, june 1, 2003): 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03 .doc.pdf (accessed apr. 7, 2009). see also karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” final report to the library of congress, mar. 17, 2006, 25, http://www.loc.gov/catdir/calhoun-report-final .pdf (accessed apr. 7, 2009). 3. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, dec. 2005, 8, http://libraries.universityofcalifornia.edu/sopag/bstf/final. pdf (accessed apr. 7, 2009). 4. deanna b. marcum, “the future of cataloging,” library resources & technical services 50, no. 1 (jan. 2006): 9, http://www .loc.gov/library/reports/catalogingspeech.pdf (accessed apr. 100 information technology and libraries | june 2010 7, 2009). 5. karen calhoun, “being a librarian: metadata and metadata specialists in the twenty-first century,” library hi tech 25, no. 2 (2007), http://www.emeraldinsight.com/insight/view contentservlet?filename=published/emeraldfulltextarticle/ articles/2380250202.html (accessed apr. 7, 2009). 6. calhoun, “the changing nature of the catalog,” 15. 7. bradford lee eden, “ending the status quo,” american libraries 39, no. 3 (mar. 2008): 38; eden, introduction to “information organization future for libraries,” library technology reports 44, no. 8 (nov./dec. 2007): 5–7. 8. see karen schneider’s “how opacs suck” series on the ala techsource blog, http://www.techsource.ala.org/ blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the -lack-of-it.html, http://www.techsource.ala.org/blog/2006/04/ how-opacs-suck-part-2-the-checklist-of-shame.html, and http:// www.techsource.ala.org/blog/2006/05/how-opacs-suck-part3-the-big-picture.html (accessed apr. 7, 2009). 9. h. frank cervone, quoted in ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (oct. 2007): 14. 10. david weinberger, everything is miscellaneous: the power of the new digital disorder (new york: times, 2007). 11. for a list of these concerns, see robert darnton, “the library in the new age,” the new york review of books 55, no. 10 (june 12, 2008), http://www.nybooks.com/articles/21514 (accessed apr. 7, 2009). 12. see calhoun, “the changing nature of the catalog,” 27. 13. see the metadata research center, “automatic metadata generation applications (amega),” http://ils.unc.edu/mrc/ amega (accessed, apr. 7, 2009). 14. sheila s. intner, “generating cataloging data automatically,” technicalities 28, no. 2 (mar./apr. 2008): 1, 15–16. 15. sheila s. intner, “a technical services makeover,” technicalities 27, no. 5 (sept./oct. 2007): 1, 14–15. 16. ibid, 14 (emphasis added). 17. carlen ruschoff, “competencies for 21st century technical services,” technicalities 27, no. 6 (nov./dec. 2007): 1, 14–16. 18. diane hillman, “a has-been cataloger looks at what cataloging will be,” online posting, metadata blog, july 1, 2008, http://blogs.ala.org/nrmig.php?title=creating_the_future_of_ the_catalog_aamp_&more=1&c=1&tb=1&pb=1 (accessed apr. 7, 2009). 19. peter brantley, “architectures for collaboration: roles and expectations for digital libraries,” educause review 43, no. 2 (mar./apr. 2008): 31–38. 20. calhoun, “the changing nature of the catalog,” 13. 21. brian briscoe, “that business/customer stuff (was: letter to al),” online posting, autocat, may 30, 2008. 22. brad wheeler, “in search of certitude,” educause review 43, no. 3 (may/june 2008): 15–34. 23. ibid., 22. 24. karen calhoun, “being a librarian.” 25. ibid. 26. ibid. (emphasis added). 27. andrew pace, quoted in roy tennant, “digitl libraries: ‘lipstick on a pig,’” library journal, apr. 15, 2005, http:// www.libraryjournal.com/article/ca516027.html (accessed apr. 7, 2009). 28. jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (mar. 2008): 5–22. 29. gregor a. preston, “how will automation affect cataloging staff?” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/ winter 1983): 134. 30. david c. taylor, “the library future: computers,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 92–93. 31. michael gorman, “technical services, 1984–2001 (and before),” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 71. lita cover 2, cover 3 neal-schuman cover 4 index to advertisers 2 information technology and libraries | march 2010 michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: join us at the forum! t he first lita national forum i attended was in milwaukee, wisconsin. it seems like it was only a couple of years ago, but in fact nine national forums have since passed. i was a new librarian, and i went on a lark when a colleague invited me to attend and let me crash in her room for free. i am so glad i took her up on the offer because it was one of the best conferences i have ever attended. it was the first conference that i felt was made up of people like me, people who shared my interests in technology within the library. the programming was a good mix of practical know-how and mindblowing possibilities. my understanding of what was possible was greatly expanded, and i came home excited and ready to try out the new things i had learned. almost eight years passed before i attended my next forum in cincinnati, ohio. after half a day i wondered why i had waited so long. the program was diverse, covering a wide range of topics. i remember being depressed and outraged on the current state of internet access in the united states as reported by the office for information technology policy. i felt that surge of recognition when i discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. i was inspired by david lanke’s talk, “obligations of leadership.” if you missed it you can still hear it online. it is linked from the lita blog (http:// www.litablog.org). while the next forum may seem like a long way off to you, it is in the forefront of my mind. the national forum 2010 planning committee is busy working to make sure this forum lives up to the reputation of forums past. this year’s forum takes place in atlanta, georgia, september 30–october 3. the theme is “the cloud and the crowd.” program proposals are due february 19, so i cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involving emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using virtualized or cloud resources for storage or computing in libraries; library-specific open-source software (oss) and other oss “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technology projects; and training via the crowd. each accepted program is scheduled to maximize the impact for each attendee. programming ranges from five-minute lightening talks to full day preconferences. in addition, on the basis of attendee comments from previous forums, we have also decided to offer thirtyand seventy-five-minute concurrent sessions. these concurrent sessions will be a mix of traditional singleor multispeaker formats, panel discussions, case studies, and demonstrations of projects. finally, poster sessions will also be available. while programs such as the keynote speakers, lightning talks, and concurrent sessions are an important part of the forum experience, so is the opportunity to network with other attendees. i know i have learned just as much talking with a group of people in the hall between sessions, during lunch, or at the networking dinners as i have sitting in the programs. not only is it a great opportunity to catch up with old friends, you will also have the opportunity to make new ones. for instance, at the 2009 national forum in salt lake city, utah, approximately half of the people who attended were first-time attendees. the national forum is an intimate event whose attendance ranges between 250 and 400 people, thus making it easy to forge personal connections. attendees come from a variety of settings, including academic, public, and special libraries; library-related organizations; and vendors. if you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by lita members. this year the dinners were organized by the lita president, lita past president, lita presidentelect, and a lita director-at-large. if you have not attended a national forum or it has been a while, i hope i have piqued your interest in coming to the next national forum in atlanta. registration will open in may! the most up-to-date information about the 2010 forum is available at the lita website (http:// www.lita.org). i know that even after my lita presidency is a distant memory, i will still make time to attend the lita national forum. i hope to see you there! editorial | truitt 3 marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. marc truitt editorial: and now for something (completely) different t he issue of ital you hold in your hands—be that issue physical or virtual; we won’t even go into the question of your hands!—represents something new for us. for a number of years, ex libris (and previously, endeavor information systems) has generously sponsored the lita/ex libris (née lita/endeavor) student writing award competition. the competition seeks manuscript submissions from enrolled lis students in the areas of ital’s publishing interests; a lita committee on which the editor of ital serves as an ex-officio member evaluates the entries and names a winner. traditionally, the winning essay has appeared in the pages of ital. in recent years, perhaps mirroring the waning interest in publication in traditional peerreviewed venues, the number of entrants in the competition has declined. in 2008, for instance, there were but nine submissions, and to get those, we had to extend the deadline six weeks from the end of february to midapril. in previous years, as i understand it, there often were even fewer. this year, without moving the goalposts, we had— hold onto your hats!—twenty-seven entries. of these, the review committee identified six finalists for discussion. the turnout was so good, in fact, that with the agreement of the committee, we at ital proposed to publish not only the winning paper but the other finalist entries as well. we hope that you will find them as stimulating as have we. even more importantly, we hope that by publishing such a large group of papers representing 2009’s best in technology-focused lis work, we will encourage similarly large numbers of quality submissions in the years to come. i would like to offer sincere thanks to my university of alberta colleague sandra shores, who as guest editor for this issue worked tirelessly over the past few months to shepherd quality student papers into substantial and interesting contributions to the literature. she and managing editor judith carter—who guest-edited our recent discovery issue—have both done fabulous jobs with their respective ital special issues. bravo! n ex libris’ sponsorship in one of those ironic twists that one more customarily associates with movie plots than with real life, the lita/ex libris student writing award recently almost lost its sponsor. at very nearly the same time that sandra was completing the preparation of the manuscripts for submission to ala production services (where they are copyedited and typeset), we learned that ex libris had notified lita that it had “decided to cease sponsoring” the student writing award. a brief round of e-mails among principals at lita, ex libris, and ital ensued, with the outcome being that carl grant, president of ex libris north america, graciously agreed to continue sponsorship for another year and reevaluate underwriting the award for the future. we at ital and i personally are grateful. carl’s message about the sponsorship raises some interesting issues on which i think we should reflect. his first point goes like this: it simply is not realistic for libraries to continue to believe that vendors have cash to fund these things at the same levels when libraries don’t have cash to buy things (or want to delay purchases or buy the product for greatly reduced amounts) from those same vendors. please understand the two are tied together. point taken and conceded. money is tight. carl’s argument, i think, speaks as well to a larger, implied question. libraries and library vendors share highly synergistic and, in recent years, increasingly antagonistic relationships. library vendors—and i think library system vendors in particular—come in for much vitriol and precious little appreciation from those of us on the customer side. we all think they charge too much (and by implication, must also make too much), that their support and service are frequently unresponsive to our needs, and that their systems are overly large, cumbersome, and usually don’t do things the way we want them done. at the same time, we forget that they are catering to the needs and whims of a small, highly specialized market that is characterized by numerous demands, a high degree of complexity, and whose members—“standards” notwithstanding—rarely perform the same task the same way across institutions. we expect very individualized service and support, but at the same time are penny-pinching misers in our ability and willingness to pay for these services. we are beggars, yet we insist on our right to be choosers. finally, at least for those of us of a certain generation—and yep, i count myself among its members—we chose librarianship for very specific reasons, which often means we are more than a little uneasy with concepts of “profit” and “bottom line” as applied to our world. we fail to understand the open-source dictum that “free as in kittens and not as in beer” means that we will have to pay someone for these services—it’s only a question of whom we will pay. carl continues, making another point: i do appreciate that you’re trying to provide us more recognition as part of this. frankly, that was another consideration in our thought of dropping it—we just didn’t feel like we were getting much for it. marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2010 i’ve said before and i’ll say again, i’ve never, in all my years in this business had a single librarian say to me that because we sponsored this or that, it was even a consideration in their decision to buy something from us. not once, ever. companies like ours live on sales and service income. i want to encourage you to help make librarians aware that if they do appreciate when we do these things, it sure would be nice if they’d let us know in some real tangible ways that show that is true. . . . good will does not pay bills or salaries unless that good will translates into purchases of products and services (and please note, i’m not just speaking for ex libris here, i’m saying this for all vendors). and here is where carl’s and my views may begin to diverge. let’s start by drawing a distinction between vendor tchotchkes and vendor sponsorship. in fairness, carl didn’t say anything about tchotchkes, so why am i? i do so because i think that we need to bear in mind that there are multiple ways vendors seek to advertise themselves and their services to us, and geegaws are one such. trinkets are nice—i have yet to find a better gel pen than the ones given out at iug 14 (would that i could get more!)—but other than reminding me of a vendor’s name, they serve little useful purpose. the latter, vendor sponsorship, is something very different, very special, and not readily totaled on the bottom line. carl is quite right that sponsorship of the student writing award will not in and of itself cause me to buy aleph, primo, or sfx (oh right, i have that last one already!). these are products whose purchase is the result of lengthy and complex reviews that include highly detailed and painstaking needs analysis, specifications, rfps, site visits, demonstrations, and so on. due diligence to our parent institutions and obligations to our users require that we search for a balance among best-of-breed solutions, top-notch support, and fair pricing. those things aren’t related to sponsorship. what is related to sponsorship, though, is a sense of shared values and interests. of “doing the right thing.” i may or may not buy carl’s products because of the considerations above (and yes, ex libris fields very strong contenders in all areas of library automation); i definitely will, though, be more likely to think favorably of ex libris as a company that has similar—though not necessarily identical—values to mine, if it is obvious that it encourages and materially supports professional activities that i think are important. support for professional growth and scholarly publication in our field are two such values. i’m sure we can all name examples of this sort of behavior: in addition to support of the student writing award, ex libris’ long-standing prominence in the national information standards organization (niso) comes to mind. so too does the founding and ongoing support by innovative interfaces and the library consulting firm r2 for the taiga forum (http://www.taigaforum.org/), a group of academic associate university librarians. to the degree that i believe ex libris or another firm shares my values by supporting such activities—that it “does the right thing”—i will be just a bit more inclined to think positively of it when i’m casting about for solutions to a technology or other need faced by my institution. i will think of that firm as kin, if you will. with that, i will end this by again thanking carl and ex libris—because we don’t say thank you often enough!—for their generous support of the lita/ex libris student writing award. i hope that it will continue for a long time to come. that support is something about which i do care deeply. if you feel similarly—be it about the student writing award, niso, taiga, or whatever—i urge you to say so by sending an appropriate e-mail to your vendor’s representative or by simply saying thanks in person to the company’s head honcho on the ala exhibit floor. and the next time you are neck-deep in seemingly identical vendor quotations and need a way to figure out how to decide between them, remember the importance of shared values. n dan marmion longtime lita members and ital readers in particular will recognize the name of dan marmion, editor of this journal from 1999 through 2004. many current and recent members of the ital editorial board—including managing editor judith carter, webmaster andy boze, board member mark dehmlow, and i—can trace our involvement with ital to dan’s enthusiastic period of stewardship as editor. in addition to his leadership of ital, dan has been a mentor, colleague, boss, and friend. his service philosophy is best summarized in the words of a simple epigram that for many years has graced the wall behind the desk in his office: “it’s all about access!!” because of health issues, and in order to devote more time to his wife diana, daughter jennifer, and granddaughter madelyn, dan recently decided to retire from his position as associate director for information systems and digital access at the university of notre dame hesburgh libraries. he also will pursue his personal interests, which include organizing and listening to his extensive collection of jazz recordings, listening to books on cd, and following the exploits of his favorite sports teams, the football irish of notre dame, the indianapolis colts, and the new york yankees. we want to express our deep gratitude for all he has given to the profession, to lita, to ital, and to each of us personally over many years. we wish him all the best as he embarks on this new phase of his life. generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 171 from previous experience and from research in software engineering. wasted effort and poor interoperability can therefore ensue, raising the costs of dls and jeopardizing the fluidity of information assets in the future. in addition, there is a need for modeling services and data structures as highlighted in the “digital library reference model” proposed by the delos eu network of excellence (also called the “delos manifesto”);2 in fact, the distribution of dl services over digital networks, typically accessed through web browsers or dedicated clients, makes the whole theme of interaction between users important, for both individual usage and remote collaboration. designing and modeling such interactions call for considerations pertaining to the fields of human– computer interaction (hci) and computer-supported cooperative work (cscw). as an example, scenariobased or activity-based approaches developed in the hci area can be exploited in dl design. to meet these needs we developed cradle (cooperative-relational approach to digital library environments),3 a metamodel-based digital library management system (dlms) supporting collaboration in the design, development, and use of dls, exploiting patterns emerging from previous projects. the entities of the cradle metamodel allow the specification of collections, structures, services, and communities of users (called “societies” in cradle) and partially reflect the delos manifesto. the metamodel entities are based on existing dl taxonomies, such as those proposed by fox and marchionini,4 gonçalves et al.,5 or in the delos manifesto, so as to leverage available tools and knowledge. designers of dls can exploit the domain-specific visual language (dvsl) available in the cradle environment—where familiar entities extracted from the referred taxonomies are represented graphically—to model data structures, interfaces and services offered to the final users. the visual model is then processed and transformed, exploiting suitable templates, toward a set of specific languages for describing interfaces and services. the results are finally transformed into platformindependent (java) code for specific dl applications. cradle supports the basic functionalities of a dl through interfaces and service templates for managing, browsing, searching, and updating. these can be further specialized to deploy advanced functionalities as defined by designers through the entities of the proposed visual the design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. to this end, high-level, language-neutral models have to be devised. metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. this paper describes cradle (cooperative-relational approach to digital library environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. a collection of tools allows the automatic generation of several services, defined with the cradle visual language, and of the graphical user interfaces providing access to them for the final user. the effectiveness of the approach is illustrated by presenting digital libraries generated with cradle, while the cradle environment has been evaluated by using the cognitive dimensions framework. d igital libraries (dls) are rapidly becoming a preferred source for information and documentation. both at research and industry levels, dls are the most referenced sources, as testified by the popularity of google books, google video, ieee explore, and the acm portal. nevertheless, no general model is uniformly accepted for such systems. only few examples of modeling languages for developing dls are available,1 and there is a general lack of systems for designing and developing dls. this is even more unfortunate because different stakeholders are interested in the design and development of a dl, such as information architects, to librarians, to software engineers, to experts of the specific domain served by the dl. these categories may have contrasting objectives and views when deploying a dl: librarians are able to deal with faceted categories of documents, taxonomies, and document classification; software engineers usually concentrate on services and code development; information architects favor effectiveness of retrieval; and domain experts are interested in directly referring to the content of interest without going through technical jargon. designers of dls are most often library technical staff with little to no formal training in software engineering, or computer scientists with little background in the research findings of hypertext information retrieval. thus dl systems are usually built from scratch using specialized architectures that do not benefit alessio malizia (alessio.malizia@uc3m.es) is associate professor, universidad carlos iii, department of informatics, madrid, spain; paolo bottoni (bottoni@di.uniroma1.it) is associate professor and s. levialdi (levialdi@di.uniroma1.it) is professor, “sapienza” university of rome, department of computer science, rome, italy. alessio malizia, paolo bottoni, and s. levialdi generating collaborative systems for digital libraries: a model-driven approach 172 information technology and libraries | december 2010 a formal foundation for digital libraries, called 5s, based on the concepts of streams, (data) structures, (resource) spaces, scenarios, and societies. while being evidence of a good modeling endeavor, the approach does not specify formally how to derive a system implementation from the model. the new generation of dl systems will be highly distributed, providing adaptive and interoperable behaviour by adjusting their structure dynamically, in order to act in dynamic environments (e.g., interfacing with the physical world).13 to manage such large and complex systems, a systematic engineering approach is required, typically one that includes modeling as an essential design activity where the availability of such domain-specific concepts as first-class elements in dl models will make application specification easier.14 while most of the disciplines related to dls—e.g., databases,15 information retrieval,16 and hypertext and multimedia17—have underlying formal models that have properly steered them, little is available to formalize dls per se. wang described the structure of a dl system as a domain-specific database together with a user interface for querying the records stored in the database.18 castelli et al. present an approach involving multidimensional query languages for searching information in dl systems that is based on first-order logic.19 these works model metadata specifications and thus are the main examples of system formalization in dl environments. cognitive models for information retrieval, as used for example by oddy et al.,20 focus on users’ information-seeking behavior (i.e., formation, nature, and properties of a users’ information need) and on how information retrieval systems are used in operational environments. other approaches based on models and languages for describing the entities involved in a dl are the digital library definition language,21 the dspace data model22 (with the definitions of communities and workflow models), the metis workflow framework,23 and the fedora structoid approach.24 e/r approaches are frequently used for modeling database management system (dbms) applications,25 but as e/r diagrams only model the static structure of a dbms, they generally do not deal deeply with dynamic aspects. temporal extensions add dynamic aspects to the e/r approach, but most of them are not object-oriented.26 the advent of object-oriented technology calls for approaches and tools to information system design resulting in object-oriented systems. these considerations drove research toward modeling approaches as supported by uml.27 however, since the uml metamodel is not yet widespread in the dl community, we adopted the e/r formalism and complemented it with the specification of the dynamics made available through the user interface, as described by malizia et al.28 using the metamodel, we have defined a dsvl, including basic entities and language. cradle is based on the entity-relationship (e/r) formalism, which is powerful and general enough to describe dl models and is supported by many tools as a metamodeling language. moreover, we observed that users and designers involved in the dl environment, but not coming from a software engineering background, may not be familiar with advanced formalism like unified modeling language (uml), but they usually have basic notions on database management systems, where e/r is largely employed. ■■ literature review dls are complex information systems involving technologies and features from different areas, such as library and information systems, information retrieval, and hci. this interdisciplinary nature is well reflected in the various definitions of dls present in the literature. as far back as 1965, licklider envisaged collections of digital versions of scanned documents accessible via interconnected computers.6 more recently, levy and marshall described dls as sets of collections of documents, together with digital resources, accessible by users in a distributed context.7 to manage the amount of information stored in such systems, they proposed some sort of user-assisting software agent. other definitions include not only printed documents, but multimedia resources in general.8 however different the definitions may be, they all include the presence of collections of resources, their organization in structured repositories, and their availability to remote users through networks (as discussed by morgan).9 recent efforts toward standardization have been taken by public and private organizations. for example, a delphi study identified four main ingredients: an organized collection of resources, mechanisms for browsing and searching, a distributed networked environment, and a set of objectified services.10 the president’s information technology advisory committee (pitac) panel on digital libraries sees dls as the networked collections of digital text, documents, images, sounds, scientific data, and software that make up the core of today’s internet and of tomorrow’s universally accessible digital repositories of all human knowledge.11 when considering dls in the context of distributed dl environments, only few papers have been produced, contrasting with the huge bibliography on dls in general. the dl group at the universidad de las américas puebla in mexico introduced the concept of personal and group spaces, relevant to the cscw domain, in the dl system context.12 users can share information stored in their personal spaces or share agents, thus allowing other users to perform the same search on the document collections in the dl. the cited text by gonçalves et al. gives generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 173 education as discussed by wattenberg or zia.33 in the nsdl program, a new generation of services has been developed that includes support for teaching and learning; this means also considering users’ activities or scenarios and not only information access. services for implementing personal content delivery and sharing, or managing digital resources and modeling collaboration, are examples of tools introduced during this program. the virtual reference desk (vrd) is emerging as an interactive service based on dls. with vrd, users can take advantage of domain experts’ knowledge and librarians’ experience to locate information. for example, the u.s. library of congress ask a librarian service acts as a vrd for users who want help in searching information categories or to interact with expert librarians to search for a specific topic.34 the interactive and collaborative aspects of activities taking place within dls facilitate the development of user communities. social networking, work practices, and content sharing are all features that influence the technology and its use. following borgmann,35 lynch sees the future of dls not in broad services but in supporting and facilitating “customization by community,” i.e., services tailored for domain-specific work practices.36 we also examined the research agenda on systemoriented issues in dls and the delos manifesto.37 the agenda abstracts the dl life cycle, identifying five main areas, and proposes key research problems. in particular we tackle activities such as formal modeling of dls and their communities and developing frameworks coherent with such models. at the architectural level, one point of interest is to support heterogeneous and distributed systems, in particular networked dls and services.38 for interoperability, one of the issues is how to support and interoperate with different metadata models and standards to allow distributed cataloguing and indexing, as in the open archive initiative (oai).39 finally, we are interested in the service level of the research agenda and more precisely in web services and workflow management as crucial features when including communities and designing dls for use over networks and for sharing content. as a result of this analysis, the cradle framework features the following: ■■ a visual language to help users and designers when visual modeling their specific dl (without knowing any technical detail apart from learning how to use a visual environment providing diagrams representations of domain specific elements) ■■ an environment integrating visual modeling and code generation instead of simply providing an integrated architecture that does not hide technical details ■■ interface generation for dealing with different users relationships for modeling dl-related scenarios and activities. the need for the integration of multiple languages has also been indicated as a key aspect of the dsvl approach.29 in fact, complex domains like dls typically consist of multiple subdomains, each of which may require its own particular language. in the current implementation, the definition of dsvls exploits the metamodeling facilities of atom3, based on graph-grammars.30 atom3 has been typically used for simulation and model transformation, but we adopt it here as a tool for system generation. ■■ requirements for modeling digital libraries we follow the delos manifesto by considering a dl as an organization (possibly virtual and distributed) for managing collections of digital documents (digital contents in general) and preserving their images on storage. a dl offers contextual services to communities of users, a certain quality of service, and the ability to apply specific policies. in cradle we leave the definition of quality of service to the service-oriented architecture standards we employ and partially model the applicable policy, but we focus here on crucial interactivity aspects needed to make dls usable by different communities of users. in particular, we model interactive activities and services based on librarians’ experiences in face-to-face communication with users, or designing exchange and integration procedures for communicating between institutions and managing shared resources. while librarians are usually interested in modeling metadata across dls, software engineers aim at providing multiple tools for implementing services,31 such as indexing, querying, semantics,32 etc. therefore we provide a visual model useful for librarians and information architects to mimic the design phases they usually perform. moreover, by supporting component services, we help software engineers to specify and add services on demand to dl environments. to this end, we use a service component model. by sharing a common language, users from different categories can communicate to design a dl system while concentrating on their own tasks (services development and design for software engineers and dl design for librarians and information architects). users are modeled according to the delos manifesto as dl end-users (subdivided into content creators, content consumers, and librarians), dl designers (librarians and information architects), dl system administrators (typically librarians), and dl application developers (software engineers). several activities have been started on modeling domain specific dls. as an example, the u.s. national science digital library (nsdl) program promotes educational dls and services for basic and advanced science 174 information technology and libraries | december 2010 ■■ how that information is structured and organized (structural model) ■■ the behavior of the dl (service model) and the different societies of actors ■■ groups of services acting together to carry out the dl behavior (societal model) figure 1 depicts the design approach supported by cradle architecture, namely, modeling the society of actors and services interacting in the domain-specific scenarios and describing the documents and metadata structure included with the library by defining a visual model for all these entities. the dl is built using a collection of stock parts and configurable components that provide the infrastructure for the new dl. this infrastructure includes the classes of objects and relationships that make up the dl, and processing tools to create and load the actual library collection from raw documents, as well as services for searching, browsing, and collection maintenance. finally, the code generation module generates tailored dl services code stubs by composing and specializing components from the component pool. initially, a dl designer is responsible for formalizing (starting from an analysis of the dl requirements and characteristics) a conceptual description of the dl using metamodel concepts. model specifications are then fed into a dl generator (written in python for atom3), to produce a dl tailored suitable for specific platforms and requirements. after these design phases, cradle generates the code for the user interface and the parts of code corresponding to services and actors interacting in the described society. a set of templates for code generation and designers ■■ flexible metadata definitions ■■ a set of interactive integrated tools for user activities with the generated dl system to sum up, cradle is a dlms aimed at supporting all the users involved in the development of a dl system and providing interfaces, data modeling, and services for user-driven generation of specific dls. although cradle does not yet satisfy all requirements for a generic dl system, it addresses issues focused on developing interactive dl systems, stressing interfaces and communication between users. nevertheless, we employed standards when possible to leave it open for further specification or enhancements from the dl user community. extensive use of xml-based languages allows us to change document information depending on implemented recognition algorithms so that expert users can easily model their dl by selecting the best recognition and indexing algorithms. cradle evolves from the jdan (java-based environment for document applications on networks) platform, which managed both document images and forms on the basis of a component architecture.40 jdan was based on xml technologies, and its modularity allowed its integration in service-based and grid-based scenarios. it supported template code generation and modeling, but it required the designer to write xml specifications and edit xml schema files in order to model the dl document types and services, thus requiring technical knowledge that should be avoided to let users concentrate on their specific domains. ■■ modeling digital library systems the cradle framework shows a unique combination of features: it is based on a formal model, exploits a set of domain-specific languages, and provides automatic code generation. moreover, fundamental roles are played by the concepts of society and collaboration.41 cradle generates code from tools built after modeling a dl (according to the rules defined by the proposed metamodel) and performs automatic transformation and mapping from model to code to generate software tools for a given dl model. the specification of a dl in cradle encompasses four complementary dimensions: ■■ multimedia information supported by the dl (collection model) figure 1. cradle architecture generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 175 socioeconomic, and environment dimensions. we now show in detail the entities and relations in the derived metamodel, shown in figure 2. actor entities actors are the users of dls. actors interact with the dl through services (interfaces) that are (or can be) affected by the actors preferences and messages (raised events). in the cradle metamodel, an actor is an entity with a behavior that may concurrently generate events. communications with other actors may occur synchronously or asynchronously. actors can relate through services to shape a digital community, i.e., the basis of a dl society. in fact, communities of students, readers, or librarians interact with and through dls, generally following predefined scenarios. as an example, societies can behave as query generator services (from the point of view of the library) and as teaching, learning, and working services (from the point of view of other humans and organizations). communication between actors within the same or different societies occur through message exchange. to operate, societies need shared data structures and message protocols, enacted by sending structured sequences of queries and retrieving collections of results. the actor entity includes three attributes: 1. role identifies which role is played by the actor within the dl society. examples of specific human roles include authors, publishers, editors, maintainers, developers, and the library staff. examples of nonhuman actors include computers, printers, telecommunication devices, software agents, and digital resources in general. 2. status is an enumeration of possible statuses for the actor: i. none (default value) ii. active (present in the model and actively generating events) iii. inactive (present in the model but not generating events) iv. sleeping (present in the model and awaiting for a response to a raised event) 3. events describes a list of events that can be raised by the actor or received as a response message from a service. examples of events are borrow, reserve, return, etc. events triggered from digital resources include store, trash, and transfer. examples of response events are found, not found, updated, etc. have been built for typical services of a dl environment. to improve acceptability and interoperability, cradle adopts standard specification sublanguages for representing dl concepts. most of the cradle model primitives are defined as xml elements, possibly enclosing other sublanguages to help define dl concepts. in more detail, mime types constitute the basis for encoding elements of a collection. the xml user interface language (xul)42 is used to represent appearance and visual interfaces, and xdoclet is used in the libgen code generation module, as shown in figure 1.43 ■■ the cradle metamodel in the cradle formalism, the specification of a dl includes a collection model describing the maintained multimedia documents, a structural model of information organization, a service model for the dl behavior, and a societal model describing the societies of actors and groups of services acting together to carry out the dl behavior. a society is an instance of the cradle model defined according to a specific collaboration framework in the dl domain. a society is the highest-level component of a dl and exists to serve the information needs of its actors and to describe its context of usage. hence a dl collects, preserves, and shares information artefacts for society members. the basic entities in cradle are derived from the categorization along the actors, activities, components, figure 2. the cradle metamodel with the e/r formalism 176 information technology and libraries | december 2010 a text document, including scientific articles and books, becomes a sequence of strings. the struct entity a struct is a structural element specifying a part of a whole. in dls, structures represent hypertexts, taxonomies, relationships between elements, or containment. for example, books can be structured logically into chapters, sections, subsections, and paragraphs, or physically into cover, pages, line groups (paragraphs), and lines. structures are represented as graphs, and the struct entity (a vertex) contains four attributes: 1. document is a pointer to the document entity the structure refers to. 2. id is a unique identifier for a structure element. 3. type takes three possible values: i. metadata denotes a content descriptor, for instance title, author, etc. ii. layout denotes the associated layout, e.g., left frame, columns, etc. iii. item indicates a generic structure element used for extending the model. 4. values is a list of values describing the element content, e.g., title, author, etc. actors interact with services in an event-driven way. services are connected via messages (send and reply) and can be sequential, concurrent, or task-related (when a service acts as a subtask of a macroservice). services perform operations (e.g., get, add, and del) on collections, producing collections of documents as results. struct elements are connected to each other as nodes of a graph representing metadata structures associated with documents. the metamodel has been translated to a dsvl, associating symbols and icons with entities and relations (see “cradle language and tools” below). with respect to the six core concepts of the delos manifesto (content, user, functionality, quality, policy, and architecture), content can be modeled in cradle as collections and structs, user as actor, and functionality as service. the quality concept is not directly modeled in cradle, but for quality of service we support standard service architecture. policies can be partially modeled by services managing interaction between actors and collections, making it possible to apply standard access policies. from the architectural point of view, we follow the reference architecture of figure 1. ■■ cradle language and tools in this section we describe the selection of languages and tools of the cradle platform. to improve interoperability service entities services describe scenarios, activities, operations, and tasks that ultimately specify the functionalities of a dl, such as collecting, creating, disseminating, evaluating, organizing, personalizing, preserving, requesting, and selecting documents and providing services to humans concerned with fact-finding, learning, gathering, and exploring the content of a dl. all these activities can be described and implemented using scenarios and appear in the dl setting as a result of actors using services (thus societies). furthermore, these activities realize and shape relationships within and between societies, services, and structures. in the cradle metamodel, the service entity models what the system is required to do, in terms of actions and processes, to achieve a task. a detailed task analysis helps understand the current system and the information flow within it in order to design and allocate tasks appropriately. the service entity has four attributes: 1. name is a string representing a textual description of the service. 2. sync states whether communication is synchronous or asynchronous, modeled by values wait and nowait, respectively. 3. events is a list of messages that can trigger actions among services (tasks); for example, valid or notvalid in case of a parsing service. 4. responses contain a list of response messages that can reply to raised events; they are used as a communication mechanism by actors and services. the collection entity collections are sets of documents of arbitrary type (e.g., bits, characters, images, etc.) used to model static or dynamic content. in the static interpretation, a collection defines information content interpreted as a set of basic elements, often of the same type, such as plain text. examples of dynamic content include video delivered to a viewer, animated presentations, and so on. the attributes of collection are name and documents. name is a string, while documents is a list of pairs (documentname, documentlabel), the latter being a pointer to the document entity. the document entity documents are the basic elements in a dl and are modeled with attributes label and structure. label defines a textual string used by a collection entity to refer to the document. we can consider it as a document identifier, specifying a class or a type of document. structure defines the semantics and area of application of the document. for example, any textual representation can be seen as a string of characters, so that generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 177 graphs. model manipulation can then be expressed via graph grammars also specified in atom3. the general process of automatic creation of cooperative dl environments for an application is shown in figure 3. initially, a designer formalizes a conceptual description of the dl using the cradle metamodel concepts. this phase is usually preceded by an analysis of requirements and interaction scenarios, as seen previously. model specifications are then provided to a dl code generator (written in python within atom3) to produce dls tailored to specific platforms and requirements. these are built on a collection of templates of services and configurable components providing infrastructure for the new dl. the sketched infrastructure includes classes for objects (tasks), relationships making up the dl, and processing tools to upload the actual library collection from raw documents, as well as services for searching and browsing and for document collections maintenance. the cradle generator automatically generates different kinds of output for the cradle model of the cooperative dl environment, such as service and collection managers. collection managers define the logical schemata of the dl, which in cradle correspond to a set of mime types, xul and xdoclet specifications, representing digital objects, their component parts, and linking information. collection managers also store instances of their and collaboration, cradle makes extensive use of existing standard specification languages. most cradle outputs are defined with xml-based formats, able to enclose other specific languages. the basic languages and corresponding tools used in cradle are the following: ■■ mime type. multipurpose internet mail extensions (mime) constitute the basis for encoding documents in cradle, supporting several file formats and types of character encoding. mime was chosen because of wide availability of mime types, and standardisation of the approach. this makes it a natural choice for dls where different types of documents need to be managed (pdf, html, doc, etc.). moreover, mime standards for character encoding descriptions help keeping the cradle framework open and compliant with standards. ■■ xul. the xml user interface language (xul) is an xml-based markup language used to represent appearance and visual interfaces. xul is not a public standard yet, but it uses many existing standards and technologies, including dtd and rdf,44 which makes it easily readable for people with a background in web programming and design. the main benefit of xul is that it provides a simple definition of common user interface elements (widgets). this drastically reduces the software development effort required for visual interfaces. ■■ xdoclet. xdoclet is used for generating services from tagged-code fragments. it is an open-source code generation library which enables attribute-oriented programming for java via insertion of special tags.45 it includes a library of predefined tags, which simplify coding for various technologies, e.g., web services. the motivation for using xdoclet in the cradle framework is related to its approach for template code generation. designers can describe templates for each service (browse, query, and index) and the xdoclet generated code can be automatically transformed into the java code for managing the specified service. ■■ atom3. atom3 is a metamodeling system to model graphical formalisms. starting from a metaspecification (in e/r), atom3 generates a tool to process models described in the chosen formalism. models are internally represented using abstract syntax figure 3. cooperative dl generation process with cradle framework 178 information technology and libraries | december 2010 and (3) the metadata operations box. the right column manages visualization and multimedia information obtained from documents. the basic features provided with the ui templates are document loading, visualization, metadata organization, and management. the layout template, in the collection box, manages the visualization of the documents contained in a collection, while the visualization template works according to the data (mime) type specified by the document. actually, by selecting a document included in the collection, the corresponding data file is automatically uploaded and visualized in the ui. the metadata visualization in the code template reflects the metadata structure (a tree) represented by a struct, specifying the relationship between parent and child nodes. thus the xul template includes an area (the metadata box) for managing tree structures as described in the visual model of the dl. although the tree-like visualization has potential drawbacks if there are many metadata items, there should be no real concern with medium loads. the ui template also includes a box to perform operations on metadata, such as insert, delete, and edit. users can select a value in the metadata box and manipulate the presented values. figure 4 shows an example of a ui generated from a basic template. service templates to achieve automated code generation, we use xdoclet to specify parameters and service code generation according to such parameters. cradle can automatically annotate java files with name–value pairs, and xdoclet provides a syntax for parameter specification. code generation is classes and function as search engines for the system. services classes also are generated and are represented as attribute-oriented classes involving parts and features of entities. ■■ cradle platform the cradle platform is based on a model-driven approach for the design and automatic generation of code for dls. in particular, the dsvl for cradle has four diagram types (collection, structure, service, and actor) to describe the different aspects of a dl. in this section we describe the user interface (ui) and service templates used for generating the dl tools. in particular, the ui layout is mainly generated from the structured information provided by the document, struct, and collection entities. the ui events are managed by invoking the appropriate services according to the imported xul templates. at the service and communication levels, the xdoclet code is generated by the service and actor entities, exploiting their relationships. we also show how code generation works and the advanced platform features, such as automatic service discovery. at the end of the section a running example is shown, representing all the phases involved in using the cradle framework for generating the dl tools for a typical library scenario. user interface templates the generation of the ui is driven by the visual model designed by the cradle user. specifically, the model entities involved in this process are document, struct and collection (see figure 2) for the basic components and layout of the interfaces, while linked services are described in the appropriate templates. the code generation process takes place through transformations implemented as actions in the atom3 metamodel specification, where graph-grammar rules may have a condition that must be satisfied for the rule to be applied (preconditions), as well as actions to be performed when the rule is executed (postconditions). a transformation is described during the visual modeling phase in terms of conditions and corresponding actions (inserting xul language statements for the interface in the appropriate code template placeholders). the generated user interface is built on a set of xul template files that are automatically specialized on the basis of the attributes and relationships designed in the visual modeling phase. the layout template for the user interface is divided into two columns (see figure 4). the left column is made of three boxes: (1) the collection box (2) the metadata box, figure 4. an example of an automatically generated user interface. (a) document area; (b) collection box; (c) metadata box; (d) metadata operations box. generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 179 "msg arguments.argname"> { "<xdtfield : fieldname/>" , "<xdtfield : fieldtagvalue tagname= "msg arguments.argname" paramname="name"/>" "<xdtfield : fieldtagvalue tagname= "msg arguments.argname" paramname=" desc "/>" } , </xdtfield : ifhasfieldtag> </xdtfield : forallfields> }; the first two lines declare a class with a name class nameimpl that extends the class name. the xdoclet template tag xdtclass:classname denotes the name of the class in the annotated java file. all standard xdoclet template tags have a namespace starting with “xdt.” the rest of the template uses xdtfield : forallfield to iterate through the fields. for each field with a tag named msg arguments.argname (checked using xdtfield : ifhasfieldtag), it creates a subarray of strings using the values obtained from the field tag parameters. xdtfield : fieldname gives the name of the field, while xdtfield : fieldtagvalue retrieves the value of a given field tag parameter. characters that are not part of some xdoclet template tags are directly copied into the generated code. the following code segment was generated by xdoclet using the annotated fields and the above template segment: public class msgargumentsimpl extends msgarguments { public static string[ ][ ] argumentnames = new string[ ][ ]{ { "eventmsg" , " event " , " eventstring " } , { " responsemsg " , " response " , " responsestring " } , }; } similarly, we generate the getter and setter methods for each field: <xdtfield : forallfields > <xdtfield : ifhasfieldtag tagname="msg arguments.argname"> public <xdtfield : fieldtype/> get <xdtfield : fieldname />() { return <xdtfield : fieldname />; } public void set <xdtfield : fieldname /> ( string value ) { based on code templates. hence service templates are xdoclet templates for transforming xdoclet code fragments obtained from the modeled service entities. the basic xdoclet template manages messages between services, according to the event and response attributes described in “cradle language and tools” above. in fact, cradle generates a java application (a service) that needs to receive messages (event) and reply to them (response) as parameters for the service application. in xdoclet, these can be attached to the corresponding field by means of annotation tags, as in the following code segments: public class msgarguments { . . . . . . /* * @msg arguments.argname name="event " desc="event_string " */ protected string eventmsg = null; /* * @msg arguments.argname name="response" * desc="response_string " */ protected string responsemsg = null; } each msg arguments.argname related to a field is called a field tag. each field tag can have multiple parameters, listed after the field tag. in the tag name msg arguments .argname, the prefix serves as the namespace of all tags for this particular xdoclet application, thus avoiding naming conflicts with other standard or customized xdoclet tags. not only fields can be annotated, but also other entities such as class and functions can have tags too. xdoclet enables powerful code generation requiring little or no customization (depending on how much is provided by the template). the type of code to be generated using the parameters is defined by the corresponding xdoclet template. we have created template files composed of java codes and special xdoclet instructions in the form of xml tags. these xdoclet instructions allow conditionals (if) and loops (for), thus providing us with expressive power close to a programming language. in the following example, we first create an array containing labels and other information for each argument: public class <xdtclass : classof> <xdtclass : classname/>impl</xdtclass : classof> extends <xdtclass : classof><xdtclass : classname/> </xdtclass : classof> { public static string[ ][ ] argumentnames = new string[ ][ ] { <xdtfield : forallfields> <xdtfield : ifhasfieldtag tagname= 180 information technology and libraries | december 2010 because different design choices in the template can lead to vastly different code. we have included an incremental mechanism by which users can modify the visual model of a dl and regenerate (xul interface) code only for the modifications. by employing this solution, librarians and dl designers can work as they would on paper by designing the visual scheme and collaboratively updating and changing it. they can generate the code, verify the implementation, and, if something has to be changed, go back to the visual model, apply modification, and generate code in a new iteration of the process. once the visual model has been modified, the system incrementally updates the code by examining only those model parts affected by the edit and modifying the corresponding parts of the generated code. the same approach could be used for services but with a different technique. in fact, predefined templates exist for basic services, e.g., indexing, uploading, and querying. to allow service providers to add new code to the rest of the service component list, we have implemented a registry listing the available service templates. when the user runs the code generation process, a routine verifies if the service templates included in the model are available in the registry and loads it into memory for the code generation process. we are planning to support a standard mechanism based on the universal description, discovery, and integration registry.47 moreover, we have developed an advanced interface template that embeds validation code into the xul templates for the interfaces to look up the list of services made available by the interface at run-time. if there are services embedded in the interface but not available, the interface is modified to prevent access to them. for instance, suppose that an interface is specified with buttons to access to the document upload and edit services. if, at run-time, the check does not find the edit service available, the interface will present only the button for the upload service. ■■ generating a digital library environment as a first step in designing the digital library environment in the cradle framework, designers model the society involved in the specific scenario. we define a running example, called library, to show the process, starting from the basic entities of the model. we consider modeling a simple dl environment. the involved actors are students and librarians. the dl collection consists of digital paper documents with publication, author, and title metadata information (struct entities). in figure 5, the cradle environment (a society) is shown together with the defined entities. circles represent actors in the model, rectangles render services, multiple rectangles represent setvalue ( "<xdtfield : fieldname/>" , value ) ; }< /xdtfield : ifhasfieldtag> </xdtfield : forallfields > this translates into the following generated code: public java.lang.string get eventmsg ( ) { return eventmsg ; } public void set eventmsg ( string value ) { setvalue ( "eventmsg" , value ) ; } public java.lang.string getresponsemsg ( ) { return getresponsemsg ; } public void setresponsemsg ( string value ) { setvalue ( " responsemsg " , value ) ; } the same template is used for managing the name and sync attributes of service entities. code generation, service discovery, and advanced features a service or interface template only describes the solution to a particular design problem—it is not code. consequently, users will find it difficult to make the leap from the template description to a particular implementation even though the template might include sample code. others, like software engineers, might have no trouble translating the template into code, but they still may find it a chore, especially when they have to do it repeatedly. the cradle visual design environment (based on atom3) helps alleviate these problems. from just a few pieces of information (the visual model), typically application-specific names for actors and services in a dl society along with choices for the design tradeoffs, the tool can create class declarations and definitions implementing the template. the ultimate goal of the modeling effort remains, however, the production of reliable and efficiently executable code. hence a code generation transformation produces interface (xul) and service (java code from xdoclet templates) code from the dl model. we have manually coded xul templates specifying the static setup of the gui, the various widgets and their layout. this must be complemented with code generated from a dl model of the systems dynamics coded into services. while other approaches are possible,46 we employed the solution implemented within the atom3 environment according to its graph grammar modeling approach to code generation. cradle supports a flexible iterative process for visual design and code generation. in fact, a design change might require substantial reimplementation generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 181 selecting one, the ui activates the metadata operations box—figure 6(d). the selected metadata node will then be presented in the lower (metadata operations) box, labeled “set metadata values,” replacing the default “none” value as shown in figure 6. after the metadata item is presented, the user can edit its value and save it by clicking on the “set value” button. the associated action saves the metadata information and causes its display in the intermediate box (tree-like structure), changing the visualization according to the new values. the code generation process for the do_search and front desk services is based on xdoclet templates. in particular, a message listener template is used to generate the java code for the front desk service. in fact, the front desk service is asynchronous and manages communications between actors. the actors classes are generated also by using the services templates since they have attributes, events, and messages, just like the services. the do_search service code is based on the producer and consumer templates, since it is synchronous by definition in the modeled scenario. a get method retrieving a collection of documents is implemented from the getter template. the routine invoked by the transformation action for struct entities performs a breadth-first exploration of the metadata tree in the visual model and attaches the corresponding xul code for displaying the struct node in the correct position within the graph structure of the ui. collections, while a single rectangle connected to a collection represents a document entity; the circles linked to the document entity are the struct (metadata) entities. metadata entities are linked to the node relationships (organized as a tree) and linked to the document entity by a metadata linktype relationship. the search service is synchronous (sync attribute set to “wait”). it queries the document collection (get operation) looking for the requested document (using metadata information provided by the borrow request), and waits for the result of get (a collection of documents). based on this result, the service returns a boolean message “is_available,” which is then propagated as a response to the librarian and eventually to the student, as shown in figure 5. when the library designer has built the model, the transformation process can be run, executing the code generation actions associated with the entities and services represented in the model. the code generation process is based on template code snippets generated from the atom3 environment graph transformation engine, following the generative rules of the metamodel. we also use pre– and postconditions on application of transformation rules to have code generation depend on verification of some property. the generated ui is presented in figure 6. on the right side, the document area is presented according to the xul template. documents are managed according to their mime type: the pdf file of the example is loaded with the appropriate adobe acrobat reader plug-in. on the left column of the ui are three boxes, according to the xul template. the collection box—figure 6(b)— presents the list of documents contained in the collection specified by the documents attribute of the library collection entity, and allows users to interact with documents. after selecting a document by clicking on the list, it is presented in the document area—figure 6(a)—where it can be managed (edit, print, save, etc.). in the metadata box—figure 6(c)—the tree structure of the metadata is depicted according to the categorization modeled by the designer. the xul template contains all the basic layout and action features for managing a tree structure. the generated box contains the parent and child nodes according to the attributes specified in the corresponding struct elements. the user can click on the root for compacting or exploding the tree nodes; by figure 5. the library model, alias the model of the library society 182 information technology and libraries | december 2010 workflow system. the release collection maintains the image files in a permanent storage, while data is written to the target database or content management software, together with xml metadata snippets (e.g., to be stored in xml native dbms). a typical configuration would have the recognition service running on a server cluster, with many dataentry services running on different clients (web browsers directly support xul interfaces). whereas current document capture environments are proprietary and closed, the definition of an xml-based interchange format allows the suitable assembly of different component-based technologies in order to define a complex framework. the realization of the jdan dl system within the cradle framework can be considered as a preliminary step in the direction of a standard multimedia document managing platform with region segmentation and classification, thus aiming at automatic recognition of image database and batch acquisition of multiple multimedia documents types and formats. personal and collaborative spaces a personal space is a virtual area (within the dl society) that is modeled as being owned and maintained by a user including resources (document collections, services, etc.), or references to resources, which are relevant to a task, or set of tasks, the user needs to carry out in the dl. personal spaces may thus contain digital documents in multiple media, personal schedules, visualization tools, and user agents (shaped as services) entitled with various tasks. resources within personal spaces can be allocated ■■ designing and generating advanced collaborative dl systems in this section we show the use of cradle as an analytical tool helpful in comprehending specific dl phenomena, to present the complex interplays that occur between cradle components and dl concepts in a real dl application, and to illustrate the possibility of using cradle as a tool to design and generate advanced tools for dl development. modeling document images collections with cradle, the designer can provide the visual model of the dl society involved in document management and the remaining phases are automatically carried out by cradle modules and templates. we have provided the user with basic code templates for the recognition and indexing services, the data-entry plug-in, and archive release. the designer can thus simply translate the particular dl society into the corresponding visual model within the cradle visual modeling editor. as a proof of concept, figure 7 models the jdan architecture, introduced in “requirements for modeling digital libraries,” exploiting the cradle visual language. the recognition service performs the automatic document recognition and stores the corresponding document images, together with the extracted metadata in the archive collection. it interacts with the scanner actor, representing a machine or a human operator that scans paper documents. designers can choose their own segmentation method or algorithm; what is required to be compliant with the framework is to produce an xdoclet template. it stores the document images into the archive collection, with its different regions layout information according to the xml metadata schema provided by the designer. if there is at least one region marked as “not interpreted,” the dataentry service is invoked on the “not interpreted” regions. the data-entry service allows operators to evaluate the automatic classification performed by the system and edit the segmentation for indexing. operators can also edit the recognized regions with the classification engine (included in the recognition service) and adjust their values and sizes. the output of this phase is an xml description that will be imported in the indexing service for indexing (and eventually querying). the archive collection stores all of the basic information kept in jdan, such as text labels, while the indexing service, based on a multitier architecture, exploiting jboss 3.0, has access to them. this service is responsible for turning the data fragments in the archive collection into useful forms to be presented to the final users, e.g., a report or a query result. the final stage in the recognition process could be to release each document to a content management or figure 6. the ui generated by cradle transforming the library model in xul and xdoclet code generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 183 and metadata, but also can share information with the various committees collaborating for certain tasks. ■■ evaluation in this section we evaluate the presented approach from three different perspectives: usability of the cradle notation, its expressiveness, and usability of the generated dls. usability of cradle notation we have tested it by using the well known cognitive dimensions framework for notations and visual language design.48 the dimensions are usually employed to evaluate the usability of a visual language or notation, or as heuristics to drive the design of innovative visual languages. the significant results are as follows. abstraction gradient an abstraction is a grouping of elements to be treated as one entity. in this sense, cradle is abstraction-tolerant. it provides entities for high-level abstractions of communication processes and services. these abstractions are intuitive as they are visualized as the process they represent (services with events and responses) and easy to learn as their configuration implies few simple attributes. although cradle does not allow users to build new abstractions, the e/r formalism is powerful enough to provide basic abstraction levels. closeness of mapping cradle elements have been assigned icons to resemble their real-world counterparts (e.g., a collection is represented as a set of paper sheets). the elements that do not have a correspondence with a physical object in the real world have icons borrowed from well-known notations (e.g., structs represented as graph nodes). consistency a notation is consistent if a user knowing some of its structure can infer most of the rest. in cradle, when two elements represent the same entity but can be used either as input or as output, then their shape is equal but incorporates an incoming or an outgoing message in order to differentiate them. see, for example, the icons for services or those for graph nodes representing either a according to the user’s role. for example, a conference chair would have access to conference-specific materials, visualization tools and interfaces to upload papers for review by a committee. similarly, we denote a group space as a virtual area in which library users (the entire dl society) can meet to conduct collaborative activities synchronously or asynchronously. explicit group spaces are created dynamically by a designer or facilitator who becomes (or appoints) the owner of the space and defines who the participants will be. in addition to direct user-touser communication, users should be able to access library materials and make annotations on them for every other group to see. ideally, users should be able to act (and carry dl materials with them) between personal and group spaces or among group spaces to which they belong. it may also be the case, however, that a given resource is referenced in several personal or group spaces. basic functionality required for personal spaces includes capabilities for viewing, launching, and monitoring library services, agents, and applications. like group spaces, personal spaces should provide users with the means to easily become aware of other users and resources that are present in a given group space at any time, as well as mechanisms to communicate with other users and make annotations on library resources. we employed this personal and group space paradigm in modeling a collaborative environment in the academic conferences domain, where a conference chair can have a personal view of the document collections (resources) figure 7. the cradle model for the jdan framwork 184 information technology and libraries | december 2010 of “sapienza” university of rome (undergraduate students), shown in figure 5, and (2) an application employed with a project of records management in a collaboration between the computer science and the computer engineering department of “sapienza” university, as shown in figure 7. usability of the generated tools environments for single-view languages generated with atom3 have been extensively used, mostly in an academic setting, in different areas like software and web engineering, modeling, and simulation; urban planning; etc. however, depending on the kind of the domain, generating the results may take some time. for instance, the state reachability analysis in the dl example takes a few minutes; we are currently employing a version of atom3 that includes petri-nets formalism where we can test the services states reachability.49 in general, from application experience, we note the general agreement that automated syntactical consistency support greatly simplifies the design of complex systems. finally, some users pointed out some technical limitations of the current implementation, such as the fact that it is not possible to open several views at a time. altogether, we believe this work contributes to make more efficient and less tedious the definition and maintenance of environments for dls. our model-based approach must be contrasted with the programmingcentric approach of most case tools, where the language and the code generation tools are hard-coded so that whenever a modification has to be done (whether on the language or on the semantic domain) developers have to dive into the code. ■■ conclusions and future work dls are complex information systems that integrate findings from disciplines such as hypertext, information retrieval, multimedia, databases, and hci. dl design is often a multidisciplinary effort, including library staff and computer scientists. wasted effort and poor interoperability can therefore ensue. examining the related bibliography, we noted that there is a lack of tools or automatic systems for designing and developing cooperative dl systems. moreover, there is a need for modeling interactions between dls and users, such as scenario or activity-based approaches. the cradle framework fulfills this gap by providing a model-driven approach for generating visual interaction tools for dls, supporting design and automatic generation of code for dls. in particular, we use a metamodel made of different diagram types (collection, structures, service, and struct or an actor, with different colors. diffuseness/terseness a notation is diffuse when many elements are needed to express one concept. cradle is terse and not diffuse because each entity expresses a meaning on its own. error-proneness data flow visualization reduces the chance of errors at a first level of the specification. on the other hand, some mistakes can be introduced when specifying visual entities, since it is possible to express relations between source and target models which cannot generate semantically correct code. however, these mistakes should be considered “programming errors more than slips,” and may be detected through progressive evaluation. hidden dependencies a hidden dependency is a relation between two elements that is not visible. in cradle, relevant dependencies are represented as data flows via directed links. progressive evaluation each dl model can be tested as soon as it is defined, without having to wait until the whole model is finished. the visual interface for the dl can be generated with just one click, and services can be subsequently added to test their functionalities. viscosity cradle has a low viscosity because making small changes in a part of a specification does not imply lots of readjustments in the rest of it. one can change properties, events or responses and these changes will have only local effect. the only local changes that could imply performing further changes by hand are deleting entities or changing names; however, this would imply minimal changes (just removing or updating references to them) and would only affect a small set of subsequent elements in the same data flow. visibility a dl specification consists of a single set of diagrams fitting in one window. empirically, we have observed that this model usually involves no more than fifteen entities. different, independent cradle models can be simultaneously shown in different windows. expressiveness of cradle the paper has illustrated the expressiveness of cradle by defining different entities end relationships for different dl requisites. to this end, two different applications have been considered: (1) a basic example elaborated with the collaboration of the information science school generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 185 retrieval (reading, mass.: addison-wesley, 1999). 17. d. lucarella and a. zanzi, “a visual retrieval environment for hypermedia information systems,” acm transactions on information systems 14 (1996): 3–29. 18. b. wang, “a hybrid system approach for supporting digital libraries,” international journal on digital libraries 2 (1999): 91–110,. 19. d. castelli, c. meghini, and p. pagano, “foundations of a multidimensional query language for digital libraries,” in proc. ecdl ’02, lncs 2458 (berlin: springer, 2002): 251–65. 20. r. n. oddy et al., eds., proc. joint acm/bcs symposium in information storage & retrieval (oxford: butterworths, 1981). 21. k. maly, m. zubair et al., “scalable digital libraries based on ncstrl/dienst,” in proc. ecdl ’00 (london: springer, 2000): 168–79. 22. r. tansley, m. bass and m. smith, “dspace as an open archival information system: current status and future directions,” proc. ecdl ’03, lncs 2769 (berlin: springer, 2003): 446–60. 23. k. m. anderson et al., “metis: lightweight, flexible, and web-based workflow services for digital libraries,” proc. 3rd acm/ieee-cs jcdl ’03 (los alamitos, calif.: ieee computer society, 2003): 98–109. 24. n. dushay, “localizing experience of digital content via structural metadata,” in proc. 2nd acm/ieee-cs jcdl ’02 (new york: acm, 2002): 244–52. 25. m. gogolla et al., “integrating the er approach in an oo environment,” proc. er, ’93 (berlin: springer, 1993): 376–89. 26. heidi gregersen and christian s. jensen, “temporal entity-relationship models—a survey,” ieee transactions on knowledge & data engineering 11 (1999): 464–97. 27. b. berkem, “aligning it with the changes using the goal-driven development for uml and mda,” journal of object technology 4 (2005): 49–65. 28. a. malizia, e. guerra, and j. de lara, “model-driven development of digital libraries: generating the user interface,” proc. mddaui ’06, http://sunsite.informatik.rwth-aachen.de/ publications/ceur-ws/vol-214/ (accessed oct 18, 2010). 29. d. l. atkins et al., “mawl: a domain-specific language for form-based services,” ieee transactions on software engineering 25 (1999): 334–46. 30. j. de lara and h. vangheluwe, “atom3: a tool for multi-formalism and meta-modelling,” proc. fase ’02 (berlin: springer, 2002): 174–88. 31. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” journal of information technology & libraries 28 (2009): 21–30. 32. n. santos, f. c. a. campos, and r. m. m. braga, “digital libraries and ontology,” in handbook of research on digital libraries: design, development, and impact, ed. y.-l. theng et al. (hershey, pa.: idea group, 2008): 1:19. 33. f. wattenberg, “a national digital library for science, mathematics, engineering, and technology education,” d-lib magazine 3 no. 10 (1998), http://www.dlib.org/dlib/october98/ wattenberg/10wattenberg.html (accessed oct 18, 2010); l. l. zia, “the nsf national science, technology, engineering, and mathematics education digital library (nsdl) program: new projects and a progress report,” d-lib magazine, 7, no. 11 (2002), http://www.dlib.org/dlib/november01/zia/11zia.html (accessed oct 18, 2010). 34. u.s. library of congress, ask a librarian, http://www.loc society), which describe the different aspects of a dl. we have built a code generator able to produce xul code from the design models for the dl user interface. moreover, we use template code generation integrating predefined components for the different services (xdoclet language) according to the model specification. extensions of cradle with behavioral diagrams and the addition of analysis and simulation capabilities are under study. these will exploit the new atom3 capabilities for describing multiview dsvls, to which this work directly contributed. references 1. a. m. gonçalves, e. a fox, “5sl: a language for declarative specification and generation of digital libraries,” proc. jcdl ’02 (new york: acm, 2002): 263–72. 2. l. candela et al., “setting the foundations of digital libraries: the delos manifesto,” d-lib magazine 13 (2007), http://www.dlib.org/dlib/march07/castelli/03castelli.html (accessed oct 18, 2010). 3. a. malizia et al., “a cooperative-relational approach to digital libraries,” proc. ecdl 2007, lncs 4675 (berlin: springer, 2007): 75–86. 4. e. a. fox and g. marchionini, “toward a worldwide digital library,” communications of the acm 41 (1998): 29–32. 5. m. a. gonçalves et al., “streams, structures, spaces, scenarios, societies (5s): a formal model for digital libraries,” acm transactions on information systems 22 (2004): 270–312. 6. j. c. r. licklider, libraries of the future (cambridge, mass.: mit pr., 1965). 7. d. m. levy and c. c. marshall, “going digital: a look at assumptions underlying digital libraries,” communications of the acm 38 (1995): 77–84. 8. r. reddy and i. wladawsky-berger, “digital libraries: universal access to human knowledge—a report to the president,” 2001, www.itrd.gov/pubs/pitac/pitac-dl-9feb01.pdf (accessed mar. 16, 2010). 9. e. l. morgan, “mylibrary: a digital library framework and toolkit,” journal of information technology & libraries 27 (2008): 12–24. 10. t. r. kochtanek and k. k. hein, “delphi study of digital libraries,” information processing management 35 (1999): 245–54. 11. s. e. howe et al., “the president’s information technology advisory committee’s february 2001 digital library report and its impact,” in proc. jcdl ’01 (new york: acm, 2001): 223–25. 12. n. reyes-farfan and j. a. sanchez, “personal spaces in the context of oa,” proc. jcdl ’03 (ieee computer society, 2003): 182–83. 13. m. wirsing, report on the eu/nsf strategic workshop on engineering software-intensive systems, 2004, http://www.ercim. eu/eu-nsf/sis.pdf (accessed oct 18, 2010) 14. s. kelly and j.-p. tolvanen, domain-specific modeling: enabling full code generation (hoboken, n.j.: wiley, 2008). 15. h. r. turtle and w. bruce croft, “evaluation of an inference network-based retrieval model,” acm transactions on information systems 9 (1991): 187–222. 16. r. a. baeza-yates, b. a. ribeiro-neto, modern information 186 information technology and libraries | december 2010 .mozilla.org/en/xul (accessed mar. 16, 2010). 43. xdoclet, welcome! what is xdoclet? http://xdoclet .sourceforge.net/xdoclet/index.html (accessed mar. 16, 2010). 44. w3c, extensible markup language (xml) 1.0 (fifth edition), http://www.w3.org/tr/2008/rec-xml-20081126/ (accessed mar. 16, 2010); w3c, resource description framework (rdf), http://www.w3.org/rdf/ (accessed mar. 16, 2010). 45. h. wada and j. suzuki, “modeling turnpike frontend system: a model-driven development framework leveraging uml metamodeling and attribute-oriented programming,” proc. models ’05, lncs 3713 (berlin: springer, 2005): 584–600. 46. i. horrocks, constructing the user interface with statecharts (boston: addison-wesley, 1999). 47. universal discover, description, and integration oasis standard, welcome to uddi xml.org, http://uddi.xml.org/ (accessed mar. 16, 2010). 48. t. r. g. green and m. petre, “usability analysis of visual programming environments: a ‘cognitive dimensions framework,’” journal of visual languages & computing 7 (1996): 131–74. 49. j. de lara, e. guerra, and a. malizia, “model driven development of digital libraries—validation, analysis and formal code generation,” proc. 3rd webist ’07 (berlin: springer, 2008). .gov/rr/askalib/ (accessed on mar. 16, 2010). 35. c. l. borgmann, “what are digital libraries? competing visions,” information processing & management 25 (1999):227–43. 36. c. lynch, “coding with the real world: heresies and unexplored questions about audience, economics, and control of digital libraries,” in digital library use: social practice in design and evaluation, ed. a. p. bishop, n. a. van house, and b. buttenfield (cambridge, mass.: mit pr., 2003): 191–216. 37. y. ioannidis et al., “digital library information-technology infrastructure,” international journal of digital libraries 5 (2005): 266–74. 38. e. a. fox et al., “the networked digital library of theses and dissertations: changes in the university community,” journal of computing higher education 13 (2002): 3–24. 39. h. van de sompel and c. lagoze, “notes from the interoperability front: a progress report on the open archives initiative,” proc. 6th ecdl, 2002, lncs 2458 (berlin: springer 2002): 144–57. 40. f. de rosa et al., “jdan: a component architecture for digital libraries,” delos workshop: digital library architectures, (padua, italy: edizioni libreria peogetto, 2004): 151–62. 41. defined as a set of actors (users) playing roles and interacting with services. 42. mozilla developer center, xul, https://developer a partnership for creating successful partnerships | grant 5 ex libris column carl grant carl grant is [tk] ex libris column carl grant a partnership for creating successful partnerships carl grant w hen marc asked me to write this column i eagerly accepted because i feel strongly about libraries leveraging their role to their greater advantage in the rapidly changing information landscape. i see sponsorships and partnerships as an important tool for doing that. however, as noted in marc’s column in this issue, we’d been having a discussion about the continuing involvement of ex libris in the lita/ex libris student writing award. like many of you, we at ex libris are trying to keep our costs low in this challenging economic environment so that we can in turn keep your costs low. thus we are closely evaluating all expenditures to ensure their cost is justified by the value they return to our organization. i won’t repeat the discussion already outlined by marc above, but will just note with great pleasure his willingness to not only listen to my concerns, but to try and address them. his invitation to write this column was part of that response, a chance for me to share my thoughts and concerns with you about sponsorships and partnerships and where they need to go in the future. to do that, i’d like to expand on some of the concepts marc and i were discussing and talk about how to make sponsorships and partnerships successful. i want to look at what successful ones consist of as well as what types are needed in our profession tomorrow. n the elements of successful sponsorships and partnerships for a sponsorship or partnership to be successful in today’s environment, it should offer at least the following components: 1. clear and shared goals. agreeing what is to be achieved via the sponsorship or partnership is essential. furthermore, it should be readily apparent that the goals are achievable. this will happen through joint planning and execution of an agreedupon project plan that results in that achievement. it is up to each partner to ensure that they have the resources to execute that project plan on schedule and on budget. as there will always be unplanned events and issues, there must also be ongoing, open communications throughout the life of the sponsorship or partnership. this way, surprises are avoided and issues can be dealt with before they become problems. 2. risks and rewards must be real and shared. members of a sponsorship or partnership should share risks and rewards in proportion to the role they hold. furthermore, the rewards must be seen to be real rewards to all the members. step into the other members’ shoes and look at what you’re offering. does it clearly bring value to the other organizations in the arrangement? if so, how? if not, what can be done to address that disparity? sponsorships and partnerships should not take advantage of any one sponsor or partner by allocating risks or rewards disproportionately to their contributions. rewards realized by members of the sponsorship or partnership should be proportionally shared by all the members. 3. defined time. a sponsorship or partnership is for a defined amount of time and should not be assumed to be ongoing. regular reviews of how well the sponsorship or partnership is working for the partners must be conducted and decisions made on the basis of those results. it might be that the landscape is changing and the benefits are no longer as meaningful, or there are alternatives now available that provide better benefits for on of the members. maintaining a sponsorship or partnership past its useful life will only result in the disintegration of the overall relationship. 4. write it down. organizations merge, are acquired and sold, people change jobs, and people change responsibilities. any sponsorship or partnership should have a written agreement outlining the elements above. once finalized, it should be signed by an appropriate person representing each member organization. that way, when things do change, there is a reference point and the arrangement is more likely to survive any of these precipitous events. n the sponsorships and partnerships needed for tomorrow successful sponsorships and partnerships are a necessary part our landscape today. the world of information and knowledge has become too large, exists in too many silos, and is far too complex. “competition, collaboration, and cooperation” defines the only path possible for navigating the landscape successfully. as the president of a company in the library automation marketplace, i continue to seek out opportunities that uniquely position our company to effectively maintain success in the marketplace and to provide value for our customers and thus our company. i believe libraries need to seek the same opportunities for their organizations. carl grant (carl.grant@exlibrisgroup.com) is president of ex libris north america, des plaines, illinois. continued on page 7 editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 128 information technology and libraries | september 2010 lynne weber and peg lawrence authentication and access: accommodating public users in an academic world in cook and shelton’s managing public computing, which confirmed the lack of applicable guidelines on academic websites, had more up-to-date information but was not available to the researchers at the time the project was initiated.2 in the course of research, the authors developed the following questions: ■■ how many arl libraries require affiliated users to log into public computer workstations within the library? ■■ how many arl libraries provide the means to authenticate guest users and allow them to log on to the same computers used by affiliates? ■■ how many arl libraries offer open-access computers for guests to use? do these libraries provide both open-access computers and the means for guest user authentication? ■■ how do federal depository library program libraries balance their policy requiring computer authentication with the obligation to provide public access to government information? ■■ do computers provided for guest use (open access or guest login) provide different software or capabilities than those provided to affiliated users? ■■ how many arl libraries have written policies for the use of open-access computers? if a policy exists, what is it? ■■ how many arl libraries have written policies for authenticating guest users? if a policy exists, what is it? ■■ literature review since the 1950s there has been considerable discussion within library literature about academic libraries serving “external,” “secondary,” or “outside” users. the subject has been approached from the viewpoint of access to the library facility and collections, reference assistance, interlibrary loan (ill) service, borrowing privileges, and (more recently) access to computers and internet privileges, including the use of proprietary databases. deale emphasized the importance of public relations to the academic library.3 while he touched on creating bonds both on and off campus, he described the positive effect of “privilege cards” to community members.4 josey described the variety of services that savannah state college offered to the community.5 he concluded his essay with these words: why cannot these tried methods of lending books to citizens of the community, story hours for children . . . , a library lecture series or other forum, a great books discussion group and the use of the library staff in the fall of 2004, the academic computing center, a division of the information technology services department (its) at minnesota state university, mankato took over responsibility for the computers in the public areas of memorial library. for the first time, affiliated memorial library users were required to authenticate using a campus username and password, a change that effectively eliminated computer access for anyone not part of the university community. this posed a dilemma for the librarians. because of its federal depository status, the library had a responsibility to provide general access to both print and online government publications for the general public. furthermore, the library had a long tradition of providing guest access to most library resources, and there was reluctance to abandon the practice. therefore the librarians worked with its to retain a small group of six computers that did not require authentication and were clearly marked for community use, along with several standup, open-access computers on each floor used primarily for searching the library catalog. the additional need to provide computer access to high school students visiting the library for research and instruction led to more discussions with its and resulted in a means of generating temporary usernames and passwords through a web form. these user accommodations were implemented in the library without creating a written policy governing the use of open-access computers. o ver time, library staff realized that guidelines for guests using the computers were needed because of misuse of the open-access computers. we were charged with the task of drafting these guidelines. in typical librarian fashion, we searched websites, including those of association of research libraries (arl) members for existing computer access policies in academic libraries. we obtained very little information through this search, so we turned to arl publications for assistance. library public access workstation authentication by lori driscoll, was of greater benefit and offered much of the needed information, but it was dated.1 a research result described lynne webber (lnweber@mnsu.edu) is access services librarian and peg lawrence (peg.lawrence@mnsu.edu) is systems librarian, minnesota state university, mankato. authentication and access | weber and lawrence 129 providing service to the unaffiliated, his survey revealed 100 percent of responding libraries offered free in-house collection use for the general public, and many others offered additional services.16 brenda johnson described a one-day program in 1984 sponsored by rutgers university libraries forum titled “a case study in closing the university library to the public.” the participating librarians spent the day familiarizing themselves with the “facts” of the theoretical case and concluded that public access should be restricted but not completely eliminated. a few months later, consideration of closing rutgers’ library to the public became a real debate. although there were strong opposing viewpoints, the recommendation was to retain the open-door policy.17 jansen discussed the division between those who wanted to provide the finest service to primary users and those who viewed the library’s mission as including all who requested assistance. jansen suggested specific ways to balance the needs of affiliates and the public and referred to the dilemma the university of california, berkeley, library that had been closed to unaffiliated users.18 bobp and richey determined that california undergraduate libraries were emphasizing service to primary users at a time when it was no longer practical to offer the same level of service to primary and secondary users. they presented three courses of action: adherence to the status quo, adoption of a policy restricting access, or implementation of tiered service.19 throughout the 1990s, the debate over the public’s right to use academic libraries continued, with increasing focus on computer use in public and private academic libraries. new authorization and authentication requirements increased the control of internal computers, but the question remained of libraries providing access to government information and responding to community members who expected to use the libraries supported by their taxes. morgan, who described himself as one who had spent his career encouraging equal access to information, concluded that it would be necessary to use authentication, authorization, and access control to continue offering information services readily available in the past.20 martin acknowledged that library use was changing as a result of the internet and that the public viewed the academic librarian as one who could deal with the explosion of information and offer service to the public.21 johnson described unaffiliated users as a group who wanted all the privileges of the affiliates; she discussed the obligation of the institution to develop policies managing these guest users.22 still and kassabian considered the dual responsibilities of the academic library to offer internet access to public users and to control internet material received and sent by primary and public users. further, they weighed as consultants be employed toward the building of good relations between town and gown.6 later, however, deale indicated that the generosity common in the 1950s to outsiders was becoming unsustainable.7 deale used beloit college, with an “open door policy” extending more than 100 years, as an example of a school that had found it necessary to refuse out-of-library circulation to minors except through ill by the 1960s.8 also in 1964, waggoner related the increasing difficulty of accommodating public use of the academic library. he encouraged a balance of responsibility to the public with the institution’s foremost obligation to the students and faculty.9 in october 1965, the ad hoc committee on community use of academic libraries was formed by the college library section of the association of college and research libraries (acrl). this committee distributed a 13-question survey to 1,100 colleges and universities throughout the united states. the high rate of response (71 percent) was considered noteworthy, and the findings were explored in “community use of academic libraries: a symposium,” published in 1967.10 the concluding article by josey (the symposium’s moderator) summarized the lenient attitudes of academic libraries toward public users revealed through survey and symposium reports. in the same article, josey followed up with his own arguments in favor of the public’s right to use academic libraries because of the state and federal support provided to those institutions.11 similarly, in 1976 tolliver reported the results of a survey of 28 wisconsin libraries (public academic, private academic, and public), which indicated that respondents made a great effort to serve all patrons seeking service.12 tolliver continued in a different vein from josey, however, by reporting the current annual fiscal support for libraries in wisconsin and commenting upon financial stewardship. tolliver concluded by asking, “how effective are our library systems and cooperative affiliations in meeting the information needs of the citizens of wisconsin?”13 much of the literature in the years following focused on serving unaffiliated users at a time when public and academic libraries suffered the strain of overuse and underfunding. the need for prioritization of primary users was discussed. in 1979, russell asked, “who are our legitimate clientele?” and countered the argument for publicly supported libraries serving the entire public by saying the public “cannot freely use the university lawn mowers, motor pool vehicles, computer center, or athletic facilities.”14 ten years later, russell, robison, and prather prefaced their report on a survey of policies and services for outside users at 12 consortia institutions by saying, “the issue of external users is of mounting concern to an institution whose income is student credit hour generated.”15 despite russell’s concerns about the strain of 130 information technology and libraries | september 2010 be aware of the issues and of the effects that licensing, networking, and collection development decisions have on access.”35 in “unaffiliated users’ access to academic libraries: a survey,” courtney reported and analyzed data from her own comprehensive survey sent to 814 academic libraries in winter 2001.36 of the 527 libraries responding to the survey, 72 libraries (13.6 percent) required all users to authenticate to use computers within the library, while 56 (12.4 percent) indicated that they planned to require authentication in the next twelve months.37 courtney followed this with data from surveyed libraries that had canceled “most” of their indexes and abstracts (179 libraries, or 33.9 percent) and libraries that had cancelled “most” periodicals (46 libraries or 8.7 percent).38 she concluded that the extent to which the authentication requirement restricted unaffiliated users was not clear, and she asked, “as greater numbers of resources shift to electronic-only formats, is it desirable that they disappear from the view of the community user or the visiting scholar?”39 courtney’s “authentication and library public access computers: a call for discussion” described a follow-up with the academic libraries participating in her 2001 survey who had self-identified as using authentication or planning to employ authentication within the next twelve months. her conclusion was the existence of ambivalence toward authentication among the libraries, since more than half of the respondents provided some sort of public access. she encouraged librarians to carefully consider the library’s commitment to service before entering into blanket license agreements with vendors or agreeing to campus computer restrictions.40 several editions of the arl spec kit series showing trends of authentication and authorization for all users of arl libraries have been an invaluable resource in this investigation. an examination of earlier spec kits indicated that the definitions of “user authentication” and “authorization” have changed over the years. user authentication, by plum and bleiler indicated that 98 percent of surveyed libraries authenticated users in some way, but at that time authentication would have been more precisely defined as authorization or permission to access personal records, such as circulation, e-mail, course registration, and file space. as such, neither authentication nor authorization was related to basic computer access.41 by contrast, it is common for current library users authenticate to have any access to a public workstation. driscoll’s library public access workstation authentication sought information on how and why users were authenticated on public-access computers, who was driving the change, how it affected the ability of federal depository libraries to provide public information, and how it affected library services in general.42 but at the time of driscoll’s survey, only 11 percent of surveyed libraries required authentication on all computers and 22 percent required it only on selected terminals. cook and shelton’s managing public computing the reconciliation of material restrictions against “principles of freedom of speech, academic freedom, and the ala’s condemnation of censorship.”23 lynch discussed institutional use of authentication and authorization and the growing difficulty of verifying bona fide users of academic library subscription databases and other electronic resources. he cautioned that future technical design choices must reflect basic library values of free speech, personal confidentiality, and trust between academic institution and publisher.24 barsun specifically examined the webpages of one hundred arl libraries in search of information pertinent to unaffiliated users. she included a historic overview of the changing attitudes of academics toward service to the unaffiliated population and described the difficult balance of college community needs with those of outsiders in 2000 (the survey year).25 barsun observed a consistent lack of information on library websites regarding library guest use of proprietary databases.26 carlson discussed academic librarians’ concerns about “internet-related crimes and hacking” leading to reconsideration of open computer use, and he described the need to compromise patron privacy by requiring authentication.27 in a chapter on the relationship of it security to academic values, oblinger said, “one possible interpretation of intellectual freedom is that individuals have the right to open and unfiltered access to the internet.”28 this statement was followed later with “equal access to information can also be seen as a logical extension of fairness.”29 a short article in library and information update alerted the authors to a uk project investigating improved online access to resources for library visitors not affiliated with the host institution.30 salotti described higher education access to e-resources in visited institutions (haervi) and its development of a toolkit to assist with the complexities of offering electronic resources to guest users.31 salotti summarized existing resources for sharing within the united kingdom and emphasized that “no single solution is likely to suit all universities and colleges, so we hope that the toolkit will offer a number of options.”32 launched by the society of college, national and university libraries (sconul), and universities and colleges information systems association (ucisa), haervi has created a best-practice guide.33 by far the most useful articles for this investigation have been those by nancy courtney. “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” a literature review on the topic of visitors in academic libraries, included a summary of trends in attitude and practice toward visiting users since the 1950s.34 the article concluded with a warning: “the shift from printed to electronic formats . . . combined with the integration of library resources with campus computer networks and the internet poses a distinct threat to the public’s access to information even onsite. it is incumbent upon academic librarians to authentication and access | weber and lawrence 131 introductory letter with the invitation to participate and a forward containing definitions of terms used within the survey is in appendix a. in total, 61 (52 percent) of the 117 arl libraries invited to participate in the survey responded. this is comparable with the response rate for similar surveys reported by plum and bleiler (52 of 121, or 43 percent), driscoll (67 of 124, or 54 percent), and cook and shelton (69 of 123, or 56 percent).45 1. what is the name of your academic institution? the names of the 61 responding libraries are listed in appendix b. 2. is your institution public or private? see figure 1. respondents’ explanations of “other” are listed below. ■❏ state-related ■❏ trust instrument of the u.s. people; quasigovernment ■❏ private state-aided ■❏ federal government research library ■❏ both—private foundation, public support 3. are affiliated users required to authenticate in order to access computers in the public area of your library? see figure 2. 4. if you answered “yes” to the previous question, does your library provide the means for guest users to authenticate? see figure 3. respondents’ explanations of “other” are listed below. all described open-access computers. ■❏ “we have a few “open” terminals” ■❏ “4 computers don’t require authentication” ■❏ “some workstations do not require authentication” ■❏ “open-access pcs for guests (limited number and function)” ■❏ “no—but we maintain several open pcs for guests” ■❏ “some workstations do not require login” 5. is your library a federal depository library? see figure 4. this question caused some confusion for the canadian survey respondents because canada has its own depository services program corresponding to the u.s. federal depository program. consequently, 57 of the 61 respondents identified themselves as federal depository (including three canadian libraries), although 5 of the 61 are more accurately members of the canadian depository services program. only two responding libraries were neither a member of the u.s. federal depository program nor of the canadian depository services program. 6. if you answered “yes” to the previous question, and computer authentication is required, what provisions have been made to accommodate use of online government documents by the general public in the library? please check all that touched on every aspect of managing public computing, including public computer use, policy, and security.43 even in 2007, only 25 percent of surveyed libraries required authentication on all computers, but 46 percent required authentication on some computers, showing the trend toward an ever increasing number of libraries requiring public workstation authentication. most of the responding libraries had a computer-use policy, with 48 percent following an institution-wide policy developed by the university or central it department.44 ■■ method we constructed a survey designed to obtain current data about authentication in arl libraries and to provide insight into how guest access is granted at various academic institutions. it should be noted that the object of the survey was access to computers located in the public areas of the library for use by patrons, not access to staff computers. we constructed a simple, fourteen-question survey using the zoomerang online tool (http://www .zoomerang.com/). a list of the deans, directors, and chief operating officers from the 123 arl libraries was compiled from an internet search. we eliminated the few library administrators whose addresses could not be readily found and sent the survey to 117 individuals with the request that it be forwarded to the appropriate respondent. the recipients were informed that the goal of the project was “determination of computer authentication and current computer access practices within arl libraries” and that the intention was “to reflect practices at the main or central library” on the respondent’s campus. recipients were further informed that the names of the participating libraries and the responses would be reported in the findings, but that there would be no link between responses given and the name of the participating library. the survey introduction included the name and contact information of the institutional review board administrator for minnesota state university, mankato. potential respondents were advised that the e-mail served as informed consent for the study. the survey was administered over approximately three weeks. we sent reminders three, five, and seven days after the survey was launched to those who had not already responded. ■■ survey questions, responses, and findings we administered the survey, titled “authentication and access: academic computers 2.0,” in late april 2008. following is a copy of the fourteen-question survey with responses, interpretative data, and comments. the 132 information technology and libraries | september 2010 ■❏ “some computers are open access and require no authentication” ■❏ “some workstations do not require login” 7. if your library has open-access computers, how many do you provide? (supply number). see figure 6. a total of 61 institutions responded to this question, and 50 reported open-access computers. the number of open-access computers ranged from 2 to 3,000. as expected, the highest numbers were reported by libraries that did not require authentication for affiliates. the mean number of open-access computers was 161.2, the median was 23, the mode was 30, and the range was 2,998. 8. please indicate which online resources and services are available to authenticated users. please check all that apply. see figure 7. ■❏ online catalog ■❏ government documents ■❏ internet browser apply. see figure 5. ■❏ temporary user id and password ■❏ open access computers (unlimited access) ■❏ open access computers (access limited to government documents) ■❏ other of the 57 libraries that responded “yes” to question 5, 30 required authentication for affiliates. these institutions offered the general public access to online government documents various ways. explanations of “other” are listed below. three of these responses indicate, by survey definition, that open-access computers were provided. ■❏ “catalog-only workstations” ■❏ “4 computers don’t require authentication” ■❏ “generic login and password” ■❏ “librarians login each guest individually” ■❏ “provision made for under-18 guests needing gov doc” ■❏ “staff in gov info also login user for quick use” ■❏ “restricted guest access on all public devices” figure 3. institutions with the means to authenticate guests figure 4. libraries with federal depository and/or canadian depository services status figure 2. institutions requiring authentication figure 1. categories of responding institutions authentication and access | weber and lawrence 133 11. does your library have a written policy for use of open access computers in the public area of the library? question 7 indicates that 50 of the 61 responding libraries did offer the public two or more open-access computers. out of the 50, 28 responded that they had a written policy governing the use of computers. conversely, open-access computers were reported at 22 libraries that had no reported written policy. 12. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. twenty-eight libraries gave a url, a url plus a summary explanation, or a summary explanation with no url. 13. does your library have a written policy for authenticating guest users? out of the 32 libraries that required their users to authenticate (see question 3), 23 also had the means to allow their guests to authenticate (see question 4). fifteen of those libraries said they had a policy. 14. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. eleven ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 9. please indicate which online resources and services are available to authenticated guest users. please check all that apply. see figure 8. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 10. please indicate which online resources and services are available on open-access computers. please check all that apply. see figure 9. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software figure 5. provisions for the online use of government documents where authentication is required figure 6. number of open-access computers offered figure 7. electronic resources for authenticated affiliated users (n = 32) number of libraries number of librariesnumber of libraries number of libraries figure 8. resources for authenticating guest users (n = 23) 134 information technology and libraries | september 2010 ■■ respondents and authentication figure 10 compares authentication practices of public, private, and other institutions described in response to question 2. responses from public institutions outnumbered those from private institutions, but within each group a similar percentage of libraries required their affiliated users to authenticate. therefore no statistically significant difference was found between authenticating affiliates in public and private institutions. of the 61 respondents, 32 (52 percent) required their affiliated users to authenticate (see question 3) and 23 of the 32 also had the means to authenticate guests (see question 4). the remaining 9 offered open-access computers. fourteen libraries had both the means to authenticate guests and had open-access computers (see questions 4 and 7). when we compare the results of the 2007 study by cook and shelton with the results of the current study (completed in 2008), the results are somewhat contradictory (see table 1).46 the differences in survey data seem to indicate that authentication requirements are decreasing; however, the literature review—specifically cook and shelton and the 2003 courtney article—clearly indicate that authentication is on the rise.47 this dichotomy may be explained, in part, by the fact that of the more than 60 arl libraries responding to both surveys, there was an overlap of only 34 libraries. the 30 u.s. federal depository or canadian depository services libraries that required their affiliated users to authenticate (see questions 3 and 5) provided guest access ranging from usernames and passwords, to open-access computers, to computers restricted to libraries gave the url to their policy; 4 summarized their policies. ■■ research questions answered the study resulted in answers to the questions we posed at the outset: ■■ thirty-two (52 percent) of the responding arl libraries required affiliated users to login to public computer workstations in the library. ■■ twenty-three (72 percent) of the 32 arl libraries requiring affiliated users to login to public computers provided the means for guest users to login to public computer workstations in the library. ■■ fifty (82 percent) of 61 responding arl libraries provided open-access computers for guest users; 14 (28 percent) of those 50 libraries provided both open-access computers and the means for guest authentication. ■■ without exception, all u.s. federal depository or canadian depository services libraries that required their users to authenticate offered guest users some form of access to online information. ■■ survey results indicated some differences between software provided to various users on differently accessed computers. office software was less frequently provided on open-access computers. ■■ twenty-eight responding arl libraries had written policies relating to the use of open-access computers. ■■ fifteen responding arl libraries had written policies relating to the authorization of guests. figure 9. electronic resources on open access computers (n = 50) figure 10. comparison of library type and authentication requirement number of libraries authentication and access | weber and lawrence 135 ■■ one library had guidelines for use posted next to the workstations but did not give specifics. ■■ fourteen of those requiring their users to authenticate had both open-access computers and guest authentication to offer to visitors of their libraries. other policy information was obtained by an examination of the 28 websites listed by respondents: ■■ ten of the sites specifically stated that the open-access computers were for academic use only. ■■ five of the sites specified time limits for use of openaccess computers, ranging from 30 to 90 minutes. ■■ four stated that time limits would be enforced when others were waiting to use computers. ■■ one library used a sign-in sheet to monitor time limits. ■■ one library mentioned a reservation system to monitor time limits. ■■ two libraries prohibited online gambling. ■■ six libraries prohibited viewing sexually explicit materials. ■■ guest-authentication policies of the 23 libraries that had the means to authenticate their guests, 15 had a policy for guests obtaining a username and password to authenticate, and 6 outlined their requirements of showing identification and issuing access. the other 9 had open-access computers that guests might use. the following are some of the varied approaches to guest authentication: ■■ duration of the access (when mentioned) ranged from 30 days to 12 months. ■■ one library had a form of sponsored access where current faculty or staff could grant a temporary username and password to a visitor. ■■ one library had an online vouching system that allowed the visitor to issue his or her own username and password online. ■■ one library allowed guests to register themselves by swiping an id or credit card. ■■ one library had open-access computers for local resources and only required authentication to leave the library domain. ■■ one library had the librarians log the users in as guests. ■■ one library described the privacy protection of collected personal information. ■■ no library mentioned charging a fee for allowing computer access. government documents, to librarians logging in for guests (see question 6). numbers of open-access computers ranged widely from 2 to more than 3,000 (see question 7). eleven (19 percent) of the responding u.s. federal depository or canadian depository services libraries that did not provide open-access computers issued a temporary id (nine libraries), provided open access limited to government documents (one library), or required librarian login for each guest (one library). all libraries with u.s. federal depository or canadian depository services status provided a means of public access to information to fulfill their obligation to offer government documents to guests. figure 11 shows a comparison of resources available to authenticated users and authenticated guests and offered on open-access computers. as might be expected, almost all institutions provided access to online catalogs, government documents, and internet browsers. fewer allowed access to licensed electronic resources and e-mail. access to office software showed the most dramatic drop in availability, especially on open-access computers. ■■ open-access computer policies as mentioned earlier, 28 libraries had written policies for their open-access computers (see question 11), and 28 libraries gave a url, a url plus a summary explanation, or a summary explanation with no url (see question 12). in most instances, the library policy included their campus’s acceptable-use policy. seven libraries cited their campus’s acceptable-use policy and nothing else. nearly all libraries applied the same acceptable-use policy to all users on all computers and made no distinction between policies for use of open-access computers or computers requiring authentication. following are some of the varied aspects of summarized policies pertaining to open-access computers: ■■ eight libraries stated that the computers were for academic use and that users might be asked to give up their workstation if others were waiting. table 1. comparison of findings from cook and shelton (2007) and the current survey (2008) authentication requirements 2007 (n = 69) 2008 (n = 61) some required 28 (46%) 23 (38%) required for all 15 (25%) 9 (15%) not required 18 (30%) 29 (48%) 136 information technology and libraries | september 2010 ■■ further study although the survey answered many of our questions, other questions arose. while the number of libraries requiring affiliated users to log on to their public computers is increasing, this study does not explain why this is the case. reasons could include reactions to the september 11 disaster, the usa patriot act, general security concerns, or the convenience of the personalized desktop and services for each authenticated user. perhaps a future investigation could focus on reasons for more frequent requirement of authentication. other subjects that arose in the examination of institutional policies were guest fees for services, age limits for younger users, computer time limits for guests, and collaboration between academic and public libraries. ■■ policy developed as a result of the survey findings as a result of what was learned in the survey, we drafted guidelines governing the use of open-access computers by visitors and other non-university users. the guidelines can be found at http://lib.mnsu.edu/about/libvisitors .html#access. these guidelines inform guests that openaccess computers are available to support their research, study, and professional activities. the computers also are governed by the campus policy and the state university system acceptable-use policy. guideline provisions enable staff to ask users to relinquish a computer when others are waiting or if the computer is not being used for academic purposes. while this library has the ability to generate temporary usernames and passwords, and does so for local schools coming to the library for research, no guidelines have yet been put in place for this function. figure 11. online resources available to authenticated affiliated users, guest users, open-access users authentication and access | weber and lawrence 137 these practices depend on institutional missions and goals and are limited by reasonable considerations. in the past, accommodation at some level was generally offered to the community, but the complications of affiliate authentication, guest registration, and vendor-license restrictions may effectively discourage or prevent outside users from accessing principal resources. on the other hand, open-access computers facilitate access to electronic resources. those librarians who wish to provide the same level of commitment to guest users as in the past as well as protect the rights of all should advocate to campus policy-makers at every level to allow appropriate guest access to computers to fulfill the library’s mission. in this way, the needs and rights of guest users can be balanced with the responsibilities of using campus computers. in addition, librarians should consider ensuring that the licenses of all electronic resources accommodate walk-in users and developing guidelines to prevent incorporation of electronic materials that restrict such use. this is essential if the library tradition of freedom of access to information is to continue. finally, in regard to external or guest users, academic librarians are pulled in two directions; they are torn between serving primary users and fulfilling the principles of intellectual freedom and free, universal access to information along with their obligations as federal depository libraries. at the same time, academic librarians frequently struggle with the goals of the campus administration responsible for providing secure, reliable networks, sometimes at the expense of the needs of the outside community. the data gathered in this study, indicating that 82 percent of responding libraries continue to provide at least some open-access computers, is encouraging news for guest users. balancing public access and privacy with institutional security, while a current concern, may be resolved in the way of so many earlier preoccupations of the electronic age. given the pervasiveness of the problem, however, fair and equitable treatment of all library users may continue to be a central concern for academic libraries for years to come. references 1. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003). 2. martin cook and mark shelton, managing public computing, spec kit 302 (washington, d.c.: association of research libraries, 2007): 16. 3. h. vail deale, “public relations of academic libraries,” library trends 7 (oct. 1958): 269–77. 4. ibid., 275. 5. e. j. josey, “the college library and the community,” faculty research edition, savannah state college bulletin (dec. 1962): 61–66. ■■ conclusions while we were able to gather more than 50 years of literature pertaining to unaffiliated users in academic libraries, it soon became apparent that the scope of consideration changed radically through the years. in the early years, there was discussion about the obligation to provide service and access for the community balanced with the challenge to serve two clienteles. despite lengthy debate, there was little exception to offering the community some level of service within academic libraries. early preoccupation with physical access, material loans, ill, basic reference, and other services later became a discussion of the right to use computers, electronic resources, and other services without imposing undue difficulty to the guest. current discussions related to guest users reflect obvious changes in public computer administration over the years. authentication presently is used at a more fundamental level than in earlier years. in many libraries, users must be authorized to use the computer in any way whatsoever. as more and more institutions require authentication for their primary users, accommodation must be made if guests are to continue being served. in addition, as courtney’s 2003 research indicates, an ever increasing number of electronic databases, indexes, and journals replace print resources in library collections. this multiplies the roadblocks for guest users and exacerbates the issue.48 unless special provisions are made for computer access, community users are left without access to a major part of the library’s collections. because 104 of the 123 arl libraries (85 percent) are federal depository or canadian depository services libraries, the researchers hypothesized that most libraries responding to the survey would offer open-access computers for the use of nonaffiliated patrons. this study has shown that federal depository libraries have remained true to their mission and obligation of providing public access to government-generated documents. every federal depository respondent indicated that some means was in place to continue providing visitor and guest access to the majority of their electronic resources— whether through open-access computers, temporary or guest logins, or even librarians logging on for users. while access to government resources is required for the libraries housing government-document collections, libraries can use considerably more discretion when considering what other resources guest patrons may use. despite the commitment of libraries to the dissemination of government documents, the increasing use of authentication may ultimately diminish the libraries’ ability and desire to accommodate the information needs of the public. this survey has provided insight into the various ways academic libraries serve guest users. not all academic libraries provide public access to all library resources. 138 information technology and libraries | september 2010 identify yourself,” chronicle of higher education 50, no. 42 (june 25, 2004): a39, http://search.ebscohost.com/login.aspx?direct =true&db=aph&an=13670316&site=ehost-live (accessed mar. 2, 2009). 28. diana oblinger, “it security and academic values,” in luker and petersen, computer & network security in higher education, 4, http://net.educause.edu/ir/library/pdf/pub7008e .pdf (accessed july 14, 2008). 29. ibid., 5. 30. “access for non-affiliated users,” library & information update 7, no. 4 (2008): 10. 31. paul salotti, “introduction to haervi-he access to e-resources in visited institutions,” sconul focus no. 39 (dec. 2006): 22–23, http://www.sconul.ac.uk/publications/ newsletter/39/8.pdf (accessed july 14, 2008). 32. ibid., 23. 33. universities and colleges information systems association (ucisa), haervi: he access to e-resources in visited institutions, (oxford: ucisa, 2007), http://www.ucisa.ac.uk/ publications/~/media/files/members/activities/haervi/ haerviguide%20pdf (accessed july 14, 2008). 34. nancy courtney, “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” journal of academic librarianship 27, no. 6 (nov. 2001): 473–78, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=5602739&site= ehost-live (accessed july 14, 2008). 35. ibid., 478. 36. nancy courtney, “unaffiliated users’ access to academic libraries: a survey,” journal of academic librarianship 29, no. 1 (jan. 2003): 3–7, http://search.ebscohost.com/login.aspx?dire ct=true&db=aph&an=9406155&site=ehost-live (accessed july 14, 2008). 37. ibid., 5. 38. ibid., 6. 39. ibid., 7. 40. nancy courtney, “authentication and library public access computers: a call for discussion,” college & research libraries news 65, no. 5 (may 2004): 269–70, 277, www.ala .org/ala/mgrps/divs/acrl/publications/crlnews/2004/may/ authentication.cfm (accessed july 14, 2008). 41. terry plum and richard bleiler, user authentication, spec kit 267 (washington, d.c.: association of research libraries, 2001): 9. 42. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003): 11. 43. cook and shelton, managing public computing. 44. ibid., 15. 45. plum and bleiler, user authentication, 9; driscoll, library public access workstation authentication, 11; cook and shelton, managing public computing, 11. 46. cook and shelton, managing public computing, 15. 47. ibid.; courtney, unaffiliated users, 5–7. 48. courtney, unaffiliated users, 6–7. 6. ibid., 66. 7. h. vail deale, “campus vs. community,” library journal 89 (apr. 15, 1964): 1695–97. 8. ibid., 1696. 9. john waggoner, “the role of the private university library,” north carolina libraries 22 (winter 1964): 55–57. 10. e. j. josey, “community use of academic libraries: a symposium,” college & research libraries 28, no. 3 (may 1967): 184–85. 11. e. j. josey, “implications for college libraries,” in “community use of academic libraries,” 198–202. 12. don l. tolliver, “citizens may use any tax-supported library?” wisconsin library bulletin (nov./dec. 1976): 253. 13. ibid., 254. 14. ralph e. russell, “services for whom: a search for identity,” tennessee librarian: quarterly journal of the tennessee library association 31, no. 4 (fall 1979): 37, 39. 15. ralph e. russell, carolyn l. robison, and james e. prather, “external user access to academic libraries,” the southeastern librarian 39 (winter 1989): 135. 16. ibid., 136. 17. brenda l. johnson, “a case study in closing the university library to the public,” college & research library news 45, no. 8 (sept. 1984): 404–7. 18. lloyd m. jansen, “welcome or not, here they come: unaffiliated users of academic libraries,” reference services review 21, no. 1 (spring 1993): 7–14. 19. mary ellen bobp and debora richey, “serving secondary users: can it continue?” college & undergraduate libraries 1, no. 2 (1994): 1–15. 20. eric lease morgan, “access control in libraries,” computers in libraries 18, no. 3 (mar. 1, 1998): 38–40, http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=306709& site=ehost-live (accessed aug. 1, 2008). 21. susan k. martin, “a new kind of audience,” journal of academic librarianship 24, no. 6 (nov. 1998): 469, library, information science & technology abstracts, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=1521445&site= ehost-live (accessed aug. 8, 2008). 22. peggy johnson, “serving unaffiliated users in publicly funded academic libraries,” technicalities 18, no. 1 (jan. 1998): 8–11. 23. julie still and vibiana kassabian, “the mole’s dilemma: ethical aspects of public internet access in academic libraries,” internet reference services quarterly 4, no. 3 (1999): 9. 24. clifford lynch, “authentication and trust in a networked world,” educom review 34, no. 4 (jul./aug. 1999), http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=2041418 &site=ehost-live (accessed july 16, 2008). 25. rita barsun, “library web pages and policies toward ‘outsiders’: is the information there?” public services quarterly 1, no. 4 (2003): 11–27. 26. ibid., 24. 27. scott carlson, “to use that library computer, please authentication and access | weber and lawrence 139 appendix a. the survey introduction, invitation to participate, and forward dear arl member library, as part of a professional research project, we are attempting to determine computer authentication and current computer access practices within arl libraries. we have developed a very brief survey to obtain this information which we ask one representative from your institution to complete before april 25, 2008. the survey is intended to reflect practices at the main or central library on your campus. names of libraries responding to the survey may be listed but no identifying information will be linked to your responses in the analysis or publication of results. if you have any questions about your rights as a research participant, please contact anne blackhurst, minnesota state university, mankato irb administrator. anne blackhurst, irb administrator minnesota state university, mankato college of graduate studies & research 115 alumni foundation mankato, mn 56001 (507)389-2321 anne.blackhurst@mnsu.edu you may preview the survey by scrolling to the text below this message. if, after previewing you believe it should be handled by another member of your library team, please forward this message appropriately. alternatively, you may print the survey, answer it manually and mail it to: systems/ access services survey library services minnesota state university, mankato ml 3097—po box 8419 mankato, mn 56001-8419 (usa) we ask you or your representative to take 5 minutes to answer 14 questions about computer authentication practices in your main library. participation is voluntary, but follow-up reminders will be sent. this e-mail serves as your informed consent for this study. your participation in this study includes the completion of an online survey. your name and identity will not be linked in any way to the research reports. clicking the link to take the survey shows that you understand you are participating in the project and you give consent to our group to use the information you provide. you have the right to refuse to complete the survey and can discontinue it at any time. to take part in the survey, please click the link at the bottom of this e-mail. thank you in advance for your contribution to our project. if you have questions, please direct your inquiries to the contacts given below. thank you for responding to our invitation to participate in the survey. this survey is intended to determine current academic library practices for computer authentication and open access. your participation is greatly appreciated. below are the definitions of terms used within this survey: ■■ “authentication”: a username and password are required to verify the identity and status of the user in order to log on to computer workstations in the library. ■■ “affiliated user”: a library user who is eligible for campus privileges. ■■ “non-affiliated user”: a library user who is not a member of the institutional community (an alumnus may be a nonaffiliated user). this may be used interchangeably with “guest user.” ■■ “guest user”: visitor, walk-in user, nonaffiliated user. ■■ “open access computer”: computer workstation that does not require authentication by user. 140 information technology and libraries | september 2010 appendix b. responding institutions 1. university at albany state university of new york 2. university of alabama 3. university of alberta 4. university of arizona 5. arizona state university 6. boston college 7. university of british columbia 8. university at buffalo, state university of ny 9. case western reserve university 10. university of california berkeley 11. university of california, davis 12. university of california, irvine 13. university of chicago 14. university of colorado at boulder 15. university of connecticut 16. columbia university 17. dartmouth college 18. university of delaware 19. university of florida 20. florida state university 21. university of georgia 22. georgia tech 23. university of guelph 24. howard university 25. university of illinois at urbana-champaign 26. indiana university bloomington 27. iowa state university 28. johns hopkins university 29. university of kansas 30. university of louisville 31. louisiana state university 32. mcgill university 33. university of maryland 34. university of massachusetts amherst 35. university of michigan 36. michigan state university 37. university of minnesota 38. university of missouri 39. massachusetts institute of technology 40. national agricultural library 41. university of nebraska-lincoln 42. new york public library 43. northwestern university 44. ohio state university 45. oklahoma state university 46. university of oregon 47. university of pennsylvania 48. university of pittsburgh 49. purdue university 50. rice university 51. smithsonian institution 52. university of southern california 53. southern illinois university carbondale 54. syracuse university 55. temple university 56. university of tennessee 57. texas a&m university 58. texas tech university 59. tulane university 60. university of toronto 61. vanderbilt university 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 generating collaborative systems for digital libraries | visser and ball 187 marijke visser and mary alice ball the middle mile: the role of the public library in ensuring access to broadband of fundamentally altering culture and society. in some circles the changes happen in real time as new web-based applications are developed, adopted, and integrated into the user’s daily life. these users are the early adopters; the internet cognoscenti. second tier users appreciate the availability of online resources and use a mix of devices to access internet content but vary in the extent to which they try the latest application or device. the third tier users also vary in the amount they access the internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 regardless of the degree to which they access the internet, all of these users require basic technology skills and a robust underlying infrastructure. since the introduction of web 2.0, the number and type of participatory web-based applications has continued to grow. many people are eagerly taking part in creating an increasing variety of web-based content because the basic tools to do so are widely available. the amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. in turn, the internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. with print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. web-based resources are unique in that they enable an undetermined number of people, personally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. many of these new resources and applications require much more bandwidth than traditional print resources. with the necessary technology no longer out of reach, a crosssection of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 in turn, who can access web-based content and who decides how it can be accessed become critical questions to answer. as people become more adept at using web-based tools and eager to try new applications, the need for greater broadband will intensify. the economic downturn is having a marked effect on people’s internet use. if there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. access to broadband internet today increases this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. i n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. this trend, which has a direct impact on library services, has only accelerated with the advent of web 2.0 technologies and participatory content creation. cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. the piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables internet connectivity. in 2010, broadband telecommunication is recognized as essential to access the full range of information resources. telecommunications experts articulate their concerns about the digital divide by focusing on firstand last-mile issues of bringing fiber and cable to end users. the library, particularly the public library, represents the metaphorical middle mile providing the public with access to rich information content. equally important, it provides technical knowledge, subject matter expertise, and general training and support to library users. this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ the culture of information information today is dynamic. as the internet continues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging web-based technology. theoretically a democratic platform, the internet and its user-generated content is in the process marijke visser (mvisser@alawash.org) is information technology policy analyst and mary alice ball (maryaliceball@yahoo .com) former chair, telecommunications subcommittee, office for information technology policy, american library association, washington, dc. 188 information technology and libraries | december 2010 the geographical location of a community will also influence what kind of internet service is available because of deployment costs. these costs are typically reflected in varying prices to consumers. in addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 encouraging competition between isps, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. ultimately, though, all of these factors influence the price end users must pay for internet access. with necessary infrastructure and telecommunications policies in place, there are individual behaviors that also affect broadband adoption. according to the pew study, “home broadband adoption 2008,” 62 percent of dial-up users are not interested in switching to broadband.8 clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. in part this may be because they only have experience with dial-up connections. depending on dial-up gives the user an inherently inferior experience because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. a dial-up user would not necessarily be aware of this difference. if this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical comfort or availability of relevant content. motivation to use the internet is influenced by the extent to which individuals find content personally relevant. whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using skype to talk to a family member deployed in iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant internet content and applications. understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. without relevant content, there is little motivation for someone not inclined to experiment with internet technology to cross what amounts to a significant hurdle to adoption. anthony wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access web-based resources.9 the scope of the issue of providing culturally relevant content is underscored in the 2008 pew study, the amount of information and variety of formats available to the user. in turn more content is being distributed as users create and share original content.3 businesses, nonprofits, municipal agencies, and educational institutions appreciate that by putting their resources online they reach a broader segment of their constituency. this approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. it is one thing to have an internet connection and another to have the skill set necessary to make productive use of it. as reported in job-seeking in u.s. public libraries in 2009, “less than 44% of the top 100 u.s. retailers accept instore paper applications.”4 municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 in addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 many of the processes that are now online require an ability to navigate the complexities of the internet at the same time as navigating difficult forms and websites. the combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. while early adopters and policy-makers debate the issues surrounding internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ barriers to broadband access by condensing internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. the first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. the latter barriers are influenced by individual behaviors. both divisions deserve attention. if local infrastructure and the internet service provider (isp) options do not support broadband access to all areas within its boundaries, the result will be that some community members can have broadband services at home while others must rely on work or public access computers. it is important to determine what kind of broadband services are available (e.g., cable, dsl, fiber, satellite) and if they are robust enough to support the activities of the community. infrastructure must already be in place or there must be economic incentive for isps to invest in improving current infrastructure or in installing new infrastructure. generating collaborative systems for digital libraries | visser and ball 189 at all. success hinges on understanding that each community is unique, on leveraging its strengths, and on ameliorating its weaknesses. local government can play a significant role in the availability of broadband access. from a municipal perspective, emphasizing the role of broadband as a factor in economic development can help define how the municipality should most effectively advocate for broadband deployment and adoption. gillett offers four initiatives appropriate for stimulating broadband from a local viewpoint. municipal governments can ■■ become leaders in developing locally relevant internet content and using broadband in their own services; ■■ adopt policies that make it easier for isps to offer broadband; ■■ subsidize broadband users and/or isps; or ■■ become involved in providing the infrastructure or services themselves.12 individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government support for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. agencies partnering to support community needs can provide evidence to local policy makers that broadband is essential for community success. once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. when the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, isps, public and private agencies, and community members. it is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. the ultimate purpose in establishing broadband internet access in a community is to benefit the individual community members, thereby stimulating local economic development. key players need to represent agencies that recognize the individual voice. a 2004 study led by strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 the study looked at thirty-six communities that received state funding to establish community technology centers (ctc). it addressed the effective use and management of ctcs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult americans who are not internet users, 33 percent report they are not interested in going online.10 that pew can report similar information five years after the wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ models for sustainable broadband availability in discussing broadband, the question of what constitutes broadband inevitably arises. gillett, lehr, and osoria, in “local government broadband initiatives,” offers a functional definition: “access is ‘broadband’ if it represents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limiting constraint on what can be done over the internet.”11 while this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). ensuring sustainable broadband access necessitates anticipating future demand. short sighted definitions, applicable at a set moment in time, limit long-term viability of alternative solutions. devising a sustainable solution calls for careful scrutiny of alternative models, because the stakes are so high in the broadband debate. there are many different players involved in constructing information policies. this does not mean, however, that their perspectives are mutually exclusive. in debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. what is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. existing circumstances may predetermine a particular starting point, but one first step is to evaluate best practices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of current and future local technologies and infrastructure as well as local, state, and federal information policies. presupposing that the goal is to provide the community with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. in addition to the technological and infrastructure issues, within a community there will be a combination of ways people access the internet. there will be those who have home access, those who need public access, and those who do not seek access 190 information technology and libraries | december 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. can we let people remain without access to robust broadband and the necessary skill set to use it effectively? no. as more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. recognition of this potential for intensifying digital divide is recognized in the federal communication commission’s (fcc) national broadband plan (nbp) released in march 2010.18 the nbp states six national broadband goals, the third of which is “every american should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 research conducted for the recommendations in the nbp was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. responses to more than thirty public notices issued by the fcc provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the united states is to be a competitive force in the twentyfirst century. access to essential information such as government, public safety, educational, and economic resources requires a broadband connection to the internet. it is incumbent on government officials, isps, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. it is not necessary to have all users up to par with the early adopters. there is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. what is important is that an individual can go online via a robust, high-speed connection that meets that individual’s needs at that moment. what this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolving needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. solutions to providing broadband internet access will be most successful when they are designed starting at the local level. community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. users need a support system that highlights opportunities available via the internet and that provides help when they run into problems. access is more than providing the infrastructure and hardware. the potential users must also find content that is culturally relevant in an environment that supports local needs and expectations. strover found the most successful ctcs were located in places that “actively attracted people for other social and entertaining reasons.”14 in other words, the ctcs did not operate in a vacuum devoid of social context. successful adoption of the ctcs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive environment. an additional point made in the study showed that without strong community leadership, there was not significant use of the ctc even when placed in an already established community center.15 this has significant implications for what constitutes access as libraries plan broadband initiatives. investments in technology and a national commitment to ensure universal access to these new technologies in the 1990s provide the current policy framework. as suggested by wilhelm in 2003, to continue to move forward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. today’s information policy debates should emphasize a similar focus. beyond accelerating broadband deployment into underserved areas, wilhelm suggests there needs to be support for training and content development that guarantees communities will actually use and benefit from having broadband deployed in their area.16 technology training and support for local agencies that provide the public with internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. individual and agency internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ finding the right solution though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the american recovery and reinvestment act of 2009.17 the issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be productive members of society is the same. what becomes of generating collaborative systems for digital libraries | visser and ball 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. an ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 the social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are implemented. engaging a variety of stakeholders will increase the likelihood of positive outcomes as community members embrace the opportunities provided by broadband internet access. it is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. ramirez states, the “unexpected outcomes” section of many evaluation reports tends to be rich with anecdotes . . . . the unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they innovate once they discover its potential.21 community members have the most to gain from having broadband internet access. including them will increase the community’s return on its investment as they take advantage of the available resources. ramirez suggests that “participatory, learning, and adaptive policy approaches” will guide the community toward developing communication technology policies that lead to a vibrant future for individuals and community alike.22 as success stories increase, the aggregation of local communities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ the role of the library public libraries play an important role in providing internet access to their community members. according to a 2008 study, the public library is the only outlet for no-fee internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. libraries provide a secondary community resource for other local agencies who can point their clients to the library for no-fee internet access. in today’s economy where anecdotal reports show an increase in library use, particularly internet use, the role of the public policies mesh with local ordinances. local stakeholders best understand the complex interworking of their community and are aware of who should be included in the decision-making process. including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the individual community members. community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. the library exemplifies a community resource whose expertise in local issues can inform information policy discussions on local, state, and federal levels. as a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the internet. the library is an established community hub for informational resources and provides dedicated staff, technology training opportunities, and no-fee public access computers with an internet connection. libraries in many communities are creating locally relevant web-based content as well as linking to other community resources on their own websites. seeking a partnership with the local library will augment a community broadband initiative. it is difficult to appreciate the impacts of current information technologies because they change so rapidly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. with web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. while there is general consensus that broadband internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the nbp, there is not necessarily agreement on benchmarks for measuring the impacts. three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. when a strategy involves significant financial and energy investments there is a tendency to want palpable results. the success of providing broadband access in a community is challenging to capture. to achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incremental changes in public welfare and economic gain. acceptable success is subjective at best but can be usefully defined in context of local constituencies. referring to participation in the development of a vibrant culture, horrigan notes that “while inherently 192 information technology and libraries | december 2010 isolation. an individual must possess skills to navigate the online resources. as users gain an understanding of the potential personal growth and opportunities broadband yields, they will be more likely to seek additional online resources. by stimulating broadband use, the library will contribute to the social and economic health of the community. if the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. it requires a commitment to encourage build out of appropriate technology necessary for the library to maintain a sustainable internet connection. it necessitates that local communities advocate for national information and communication policies that are pro-library. when public policy supports the library’s efforts, the local community benefits and society at large can progress. what if the library’s own technology needs are not met? the role of the library in its community is becoming increasingly important as more people turn to it for their internet access. without sufficient revenue, the library will have a difficult time meeting this additional demand for services. in turn, in many libraries increased demand for broadband access stretches the limit of it support for both the library staff and the patrons needing help at the computers. what will be the fallout from the library not being able to provide internet services the patrons desire and require? will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? what will the social impact be of remaining off line either completely or only marginally? can the library be the bridge between those on the edge, those in the middle, and those at the end? with a strong and well articulated vision for the future, the library can be the link that provides the community with sustainable broadband. ■■ conclusion the recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. whether the goal includes facilitating meaningful access continues to be more elusive. as government, organizations, businesses, and individuals rely more heavily on the internet for sharing and receiving information, broadband internet access will continue to increase in importance. following the status quo will not necessarily lead to more people having broadband access in the long run. the early adopters will continue to stimulate technological innovation which, in turn, will trickle down the ranks of the different user types. currently, library as a stable internet provider cannot be overestimated. to maintain its vital function, however, the library must also resolve infrastructure challenges of its own. because of the increased demand for access to internet resources, public libraries are finding their current broadband services are not able to support the demand of their patrons. the issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. in 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 to add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (it) because of either staff time constraints or the lack of a dedicated it staff.25 public libraries are facing considerable infrastructure management issues at a time when library use is increasing. overcoming the challenges successfully will require support on the local, state, and federal level. here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. today, the elemental policy issue regarding access to information via the internet hinges on connectivity to a sustainable broadband network. to promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. additionally, she will need to educate local policy makers about the need for broadband in their community. in some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. the librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. there are many strata of internet users, from those in the forefront of early adoption to those not interested in being online at all. the early adopters drive the market which responds by making resources more and more likely to be primarily available only online. as we continue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. by folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. to realize the internet’s full potential, access to it cannot be provided in generating collaborative systems for digital libraries | visser and ball 193 community, the entire community benefits regardless of where and how the individuals go online. the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet is not decoration on contemporary society but a challenge to it.28 being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. access to the internet, however, is more than simple connectivity. successful access requires: an understanding of the benefits to going on line, technological comfort, information literacy, ongoing support and training, and the availability of culturally relevant content. people are at various levels of internet use, from those eagerly anticipating the next iteration of web-based applications to those hesitant to open an e-mail account. this user spectrum is likely to continue. though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. the implications of the pervasiveness of the internet are only beginning to be appreciated and understood. because of their involvement at the cutting edge of internet evolution, librarians can help lead the conversations. libraries have always been situated in neutral territory within their communities and closely aligned with the public good. librarians understand the perspective of their patrons and are grounded in their local communities. librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as mattering. connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. libraries have carved out a role for themselves as a premier internet access provider in the continually evolving online culture. as noted by bertot, mcclure, and jaeger, the “role of internet access provider for the community is ingrained in the social perceptions of public libraries, and public internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 in times of both economic crisis and technological innovation, there are many unknowns. in part because of these two juxtaposed events, the role of the public library is in flux. additionally, the network of community organizations that libraries link to is becoming more and more complex. it is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. evolving internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. one clear path the library community can take however, the supply of internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. staying the course and following a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. the question of how to ensure that internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. strover, in her 2000 article the first mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the internet and her qualitative experience rather than from a last mile perspective which emphasizes isp, infrastructure, and market concerns.26 both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. according to strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 by switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. who will bring this perspective to the table? and how will we ascertain what the best approach to supporting the individual voice should be? the first mile perspective is one the library is intimately familiar with as an organization that traditionally advocates for the first mile of all information policies. the library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. as part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. a natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. the library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. to extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that successfully connects the internet to the consumer. while the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. by stimulating demand for broadband within a 194 information technology and libraries | december 2010 initiatives,” 538. 12. ibid., 537–58. 13. sharon strover, gary chapman, and jody waters, “beyond community networking and ctcs: access, development, and public policy,” telecommunications policy 28, no. 7/8 (2004): 465–85. 14. ibid., 483. 15. ibid. 16. wilhelm, “leveraging sunken investments in communications infrastructure,” 282. 17. see, for example, the virginia broadband round table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the ohio broadband council (http://www.ohiobroad bandcouncil.org/), and the california broadband task force (http://gov.ca.gov/speech/4596. see www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the american recovery and reinvestment act. 18. federal communication commission, national broadband plan: connecting america, http://www.broadband.gov/ (accessed apr. 11, 2010). 19. ibid. 20. horrigan, “broadband: what’s all the fuss about?” 2. 21. ricardo ramirez, “appreciating the contribution of broadband ict with rural and remote communities: stepping stones toward and alternative paradigm,” the information society 23 (2007): 86. 22. ibid., 92. 23. denise m. davis, john carlo bertot, and charles, r. mcclure, “libraries connect communities: public library funding & technology access study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/libraries connectcommunities.pdf (accessed jan. 24, 2009). 24. john carlo bertot et al., “public libraries and the internet 2008: study results and findings,” 11, http://www.ii.fsu.edu/ projectfiles/plinternet/2008/everything.pdf (accessed jan. 24, 2009). these numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. ibid. 26. sharon strover, “the first mile,” the information society 16, no. 2 (2000): 151–54. 27. ibid., 151. 28. clay shirky, “here comes everybody: the power of organizing without organizations.” berkman center for internet & society (2008). video presentation. available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (retrieved march 1, 2009). 29. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of internet resources to the general public. the broadband debate has moved out of the background of telecommunication policy and into the center of public attention. now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. the library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband internet access. references and notes 1. lee rainie, “2.0 and the internet world,” internet librarian 2007, http://www.pewinternet.org/presentations/2007/20 -and-the-internet-world.aspx (accessed mar. 4, 2009). see also john horrigan, “a typology of information and communication technology users,” 2007, www.pewinternet.org/~/media// files/reports/2007/pip_ict_typology.pdf.pdf (accessed feb. 12, 2009). 2. lawrence lessig, “early creative commons history, my version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed jan. 20, 2009). see the relevant passage from 20:53 through 21:50. 3. john horrigan, “broadband: what’s all the fuss about?” 2007, p. 1, http://www.pewinternet.org/~/media/ files/reports/2007/broadband%20fuss.pdf.pdf (accessed feb. 12, 2009). 4. “job-seeking in us public libraries,” public library funding & technology access study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed mar. 27, 2009). 5. ibid. 6. ibid. 7. sharon e. gillett, william h. lehr, and carlos osorio, “local government broadband initiatives,” telecommunications policy 28 (2004): 539. 8. john horrigan, “home broadband adoption 2008,” 10, http://www.pewinternet.org/~/media//files/reports/2008/ pip_broadband_2008.pdf (accessed feb. 12, 2009). 9. anthony g. wilhelm, “leveraging sunken investments in communications infrastructure: a policy perspective from the united states,” the information society 19 (2003): 279–86. 10. horrigan, “home broadband adoption,” 12. 11. gillett, lehr, and osorio, “local government broadband the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers 8 information technology and libraries | march 2010 t. michael silver monitoring network and service availability with open-source software silver describes the implementation of a monitoring system using an open-source software package to improve the availability of services and reduce the response time when troubles occur. he provides a brief overview of the literature available on monitoring library systems, and then describes the implementation of nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. particular attention is paid to using the plug-in architecture to monitor library services effectively. the author includes example displays and configuration files. editor’s note: this article is the winner of the lita/ex libris writing award, 2009. l ibrary it departments have an obligation to provide reliable services both during and after normal business hours. the it industry has developed guidelines for the management of it services, but the library community has been slow to adopt these practices. the delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in it best practices, and a focus on automation systems instead of infrastructure. larger systems that employ dedicated it professionals to manage the organization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. in the practice of system and network administration, thomas a. limoncelli, christine j. hogan, and strata r. chalup present a comprehensive look at best practices in managing systems and networks. early in the book they provide a short list of first steps toward improving it services, one of which is the implementation of some form of monitoring. they point out that without monitoring, systems can be down for extended periods before administrators notice or users report the problem.1 they dedicate an entire chapter to monitoring services. in it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which provides long-term data on uptime, use, and performance.2 while the software discussed in this article provides both types of monitoring, i focus on real-time monitoring and the value of problem identification and notification. service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. an article in the september 2008 issue of ital describes the development and deployment of a wireless network, including a perl script written to monitor the wireless network and associated services.3 the script updates a webpage to display the results and sends an e-mail notifying staff of problems. an enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. it would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. dave pattern at the university of huddersfield shared another perl script that monitors opac functionality.4 again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. below, i discuss how i modified his script to provide more meaningful monitoring of our opac than the stock webpage monitoring plug-in included with our opensource networks monitoring system, nagios. service monitoring can consist of a variety of tests. in its simplest form, a ping test will verify that a host (server or device) is powered on and successfully connected to the network. feher and sondag used ping tests to monitor the availability of the routers and access points on their network, as do i for monitoring connectivity to remote locations.5 a slightly more meaningful check would test for the establishment of a connection on a port. feher and sondag used this method to check the daemons in their network.6 the step further would be to evaluate a service response, for example checking the status code returned by a web server. evaluating content forms the next level of meaning. limoncelli, hogan, and chalup discuss end-to-end monitoring, where the monitoring system actually performs meaningful transactions and evaluates the results.7 pattern’s script, mentioned above, tests opac functionality by submitting a known keyword search and evaluating the response.8 i implemented this after an incident where nagios failed to alert me to a problem with the opac. the web server returned a status code of 200 to the request for the search page. users, however, want more from an opac, and attempts to search were unsuccessful because of problems with the index server. modifying pattern’s original script, i was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n software selection limoncelli, hogan, and chalup do not address specific t. michael silver (michael.silver@ualberta.ca) is an mlis student, school of library and information studies, university of alberta, edmonton, alberta, canada. monitoring network and service availability with open-source software | silver 9 how-to issues and rarely mention specific products. their book provides the foundational knowledge necessary to identify what must be done. in terms of monitoring, they leave the selection of an appropriate tool to the reader.9 myriad monitoring tools exist, both commercial and open-source. some focus on network analysis, and some even target specific brands or model lines. the selection of a specific software package should depend on the services being monitored and the goals for the monitoring. wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are opensource projects under a general public license or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 von hagen and jones suggest two of them: nagios and zabbix.11 i selected the nagios open-source product (http:// www.nagios.org). the software has an established history of active development, a large and active user community, a significant number of included and usercontributed extensions, and multiple books published on its use. commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. monitoring appliances based on nagios are available, as are sensors designed to interoperate with nagios. because of the flexibility of a software design that uses a plug-in architecture, service checks for library-specific applications can be implemented. if a check or action can be scripted using practically any protocol or programming language, nagios can monitor it. nagios also provides a variety of information displays, as shown in appendixes a–e. n installation the nagios system provides an extremely flexible solution to monitor hosts and services. the object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. community sites such as monitoringexchange (formerly nagios exchange), nagios community, and nagios wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with nagios.12 but that flexibility comes at a cost—nagios has a steep learning curve, and usercontributed plug-ins often require the installation of other software, most notably perl modules. nagios runs on a variety of linux, unix, and berkeley software distribution (bsd) operating systems. for testing, i used a standard linux server distribution installed on a virtual machine. virtualization provides an easy way to test software, especially if an alternate operating system is needed. if given sufficient resources, a virtual machine is capable of running the production instance of nagios. after installing and updating the operating system, i installed the following packages: n apache web server n perl n gd development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the gd library n gcc and gnu make, which are needed to compile some plug-ins and perl modules most major linux and bsd distributions include nagios in their software repositories for easy installation using the native package management system. although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. if a reasonably recent version of the software is available from a repository, i will install from there. some software packages are either outdated or not available, and i manually install these. detailed installation instructions are available on the nagios website, in several books, and on the previously mentioned websites.13 the documentation for version 3 includes a number of quick-start guides.14 most package managers will take care of some of the setup, including modifying the apache configuration file to create an alias available at http://server.name/nagios. i prepared the remainder of this article using the latest stable versions of nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n configuration nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. planning your configuration beforehand is highly recommended. nagios has two main configuration files, cgi.cfg and nagios.cfg. the former is primarily used by the web interface to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. the latter is the main configuration file and controls all other program operations. the cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). the flexibility offered allows a variety of different structures. i group network 10 information technology and libraries | march 2010 devices into groups but create individual files for each server. nagios uses an objectoriented design. the objects in nagios are displayed in table 1. a complete review of nagios configuration is beyond the scope of this article. the documentation installed with nagios covers it in great detail. special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a manageable configuration. the discussion below provides a brief introduction, while appendixes f–j provide concrete examples of working configuration files. n cgi.cfg the cgi.cfg file controls the web interface and its associated cgi (common gateway interface) programs. during testing, i often turn off authentication by setting use_authentication to 0 if the web interface is not accessible from the internet. there also are various configuration directives that provide greater control over which users can access which features. the users are defined in the /etc/nagios/htpasswd.users file. a summary of commands to control entries is presented in table 2. the web interface includes other features, such as sounds, status map displays, and integration with other products. discussion of these directives is beyond the scope of this article. the cgi.cfg file provided with the software is well commented, and the nagios documentation provides additional information. a number of screenshots from the web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg the nagios.cfg file controls the operation of everything except the web interface. although it is possible to have a single monolithic configuration file, organizing the configuration into manageable files works better. the two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. a third type of file that gets included is resource.cfg, which defines various macros for use in commands. organizing the object files takes some thought. i monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. i use the following configuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix h) n groups.cfg, containing all groups—hostgroups, servicegroups, and contactgroups, (see appendix g) n templates.cfg, containing all object templates, (see appendix f) n timeperiods.cfg, containing the time ranges for checks and notifications all devices and servers that i monitor are placed in directories using the cfg_dir directive: servers—contains server configurations. each file includes the host and service configurations for a physical or virtual server. devices—contains device information. i create individual files for devices with service monitoring that goes beyond simple ping tests for connectivtable 1. nagios objects object used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts figure 1. nagios configuration relationships. copyright © 2009 ethan galstead, nagios enterprises. used with permission. monitoring network and service availability with open-source software | silver 11 ity. devices monitored solely for connectivity are grouped logically into a single file. for example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. the resource.cfg file uses two macros to define the path to plug-ins and event handlers. thirty other macros are available. because the cgi programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for usernames and passwords needed in check commands. placing sensitive information in service configurations exposes them to the web server, creating a security issue. n configuration the appendixes include the object configuration files for a simple monitoring situation. a switch is monitored using a simple ping test (see appendix j), while an opac server on the other side of the switch is monitored for both web and z39.50 operations (see appendix i). note that the opac configuration includes a parents directive that tells nagios that a problem with the gateway-switch will affect connectivity with the opac server. i monitor fifty remote sites. if my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. the web port, web service, and opac search services demonstrate different levels of monitoring. the web port simply attempts to establish a connection to port 80 without evaluating anything beyond a successful connection. the web service check requests a specific page from the web server and evaluates only the status code returned by the server. it displays a warning because i configured the check to download a file that does not exist. the web server is running because it returns an error code, hence the warning status. the opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. i used a number of templates in the creation of this configuration. templates reduce the amount of repetitive typing by allowing the reuse of directives. templates can be chained, as seen in the host templates. the opac definition uses the linux-server template, which in turn uses the generic-host template. the host definition inherits the directives of the template it uses, overriding any elements in both and adding new elements. in practical terms, generic-host directives are read first. linux-server directives are applied next. if there is a conflict, the linuxserver directive takes precedence. finally, opac is read. again, any conflicts are resolved in favor of the last configuration read, in this case opac. n plug-ins and service checks the nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. using the plug-ins is straightforward, as demonstrated in the appendixes. most plugins will provide some information on use if executed with—help supplied as an argument to the command. by default, the plug-ins are installed in /usr/lib/nagios/ plugins. some distributions may install them in a different directory. the plugins folder contains a subfolder with usercontributed scripts that have proven useful. most of these plug-ins are perl scripts, many of which require additional perl modules available from the comprehensive perl archive network (cpan). the check_hip_search plug-in (appendix k) used in the examples requires additional modules. installing perl modules is best accomplished using the cpan perl module. detailed instructions on module installation are available online.15 some general tips: n gcc and make should be installed before trying to install perl modules, regardless of whether you are installing manually or using cpan. most modules are provided as source code, which may require compiling before use. cpan automates this process but requires the presence of these packages. n alternately, many linux distributions provide perl module packages. using repositories to install usually works well assuming the repository has all the needed modules. in my experience, that is rarely the case. table 2. sample commands for managing the htpasswd.users file create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users <username> create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users <username> <password> delete an entry from the file: htpasswd -d /etc/nagios/htpasswd.users <username> 12 information technology and libraries | march 2010 n many modules depend on other modules, sometimes requiring multiple install steps. both cpan and distribution package managers usually satisfy dependencies automatically. manual installation requires the installer to satisfy the dependencies one by one. n most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. in the absence of such documentation, running the script on the command line usually produces an error containing the name of the missing module. n testing should be done using the nagios user. using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. the best practice is to use the nagios user for as much of the configuration and testing as possible. the lists and forums frequently include questions from new users that have successfully installed, configured, and tested nagios as the root user and are confused when nagios fails to start or function properly. n advanced topics once the system is running, more advanced features can be explored. the documentation describes many such enhancements, but the following may be particularly useful depending on the situation. n nagios provides access control through the combination of settings in the cgi.cfg and htpasswd.users files. library administration and staff, as well as patrons, may appreciate the ability to see the status of the various systems. however, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to cgi programs that perform actions. n nagios permits the establishment of dependency relationships. host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connecting services in a meaningful manner. for example, certain opac functions are dependent on ils services. defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n event handlers allow nagios to initiate certain actions after a state change. if nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. care should be taken when creating these scripts as service restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n nagios provides notification escalations, permitting the automatic notification of problems that last longer than a certain time. for example, a service escalation could send the first three alerts to the admin group. if properly configured, the fourth alert would be sent to the managers group as well as the admin group. in addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n nagios can work in tandem with remote machines. in addition to custom scripts using secure shell (ssh), the nagios remote plug-in executor (nrpe) add-on allows the execution of plug-ins on remote machines, while the nagios service check acceptor (nsca) add-on allows a remote host to submit check results to the nagios server for processing. implementing nagios on the feher and sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. these add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. the nagios exchange (http://exchange.nagios .org/) contains similar user-contributed programs for windows. n nagios can be configured to provide redundant or failover monitoring. limoncelli, hogan, and chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n alternative means of notification increase access to information on the status of the network. i implemented another open-source software package, quickpage, which allows nagios text messages to be sent from a computer to a pager or cell phone.17 appendix l shows a screenshot of a firefox extension that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 the nagios community has developed a number of alternatives, including specialized web interfaces and rss feed generators.19 monitoring network and service availability with open-source software | silver 13 n appropriate use monitoring uses bandwidth and adds to the load of machines being monitored. accordingly, an it department should only monitor its own servers and devices, or those for which it has permission to do so. imagine what would happen if all the users of a service such as worldcat started monitoring it! the additional load would be noticeable and could conceivably disrupt service. aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. an organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. if a library has a definite need to monitor another service, obtaining permission to do so is a vital first step. if permission is withheld, the service level agreement between the library and its service provider or vendor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n benefits the system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. i was able to justify the time spent on setting up monitoring the first day of production. one of the available plug-ins monitors sybase database servers. it was one of the first contributed plug-ins i implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. this happened twice, approximately a year apart. each time, the integrated library system was down while the vendor addressed the issue. when i enabled the sybase service checks, nagios immediately returned a warning for the free space. the advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. that single event convinced the library director of the value of the system. since that time, nagios has proven its worth in alerting it staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n conclusion monitoring systems and services provides it staff with a vital tool in providing quality customer service and managing systems. installing and configuring such a system involves a learning curve and takes both time and computing resources. my experiences with nagios have convinced me that the return on investment more than justifies the costs. references 1. thomas a. limoncelli, christina j. hogan, and strata r. chalup, the practice of system and network administration, 2nd ed. (upper saddle river, n.j.: addison-wesley, 2007): 36. 2. ibid., 523–42. 3. james feher and tyler sondag, “administering an opensource wireless network,” information technology & libraries 27, no. 3 (sept. 2008): 44–54. 4. dave pattern, “keeping an eye on your hip,” online posting, jan. 23, 2007, self-plagiarism is style, http://www.daveyp .com/blog/archives/164 (accessed nov. 20, 2008). 5. feher and sondag, “administering an open-source wireless network,” 45–54. 6. ibid., 48, 53–54. 7. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 8. pattern, “keeping an eye on your hip.” 9. limoncelli, hogan, and chalup, the practice of system and network administration, xxv. 10. “comparison of network monitoring systems,” wikipedia, the free encyclopedia, dec. 9, 2008, http://en.wikipedia .org/wiki/comparison_of_network_monitoring_systems (accessed dec. 10, 2008). 11. william von hagen and brian k. jones, linux server hacks, vol. 2 (sebastopol, calif.: o’reilly, 2005): 371–74 (zabbix), 382–87 (nagios). 12. monitoringexchange, http://www.monitoringexchange. org/ (accessed dec. 23, 2009); nagios community, http:// community.nagios.org (accessed dec. 23, 2009); nagios wiki, http://www.nagioswiki.org/ (accessed dec. 23, 2009). 13. “nagios documentation,” nagios, mar. 4, 2008, http:// www.nagios.org/docs/ (accessed dec. 8, 2008); david josephsen, building a monitoring infrastructure with nagios (upper saddle river, n.j.: prentice hall, 2007); wolfgang barth, nagios: system and network monitoring, u.s. ed. (san francisco: open source press; no starch press, 2006). 14. ethan galstead, “nagios quickstart installation guides,” nagios 3.x documentation, nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed dec. 3, 2008). 15. the perl directory, (http://www.perl.org/) contains complete information on perl. specific information on using cpan is available in “how do i install a module from cpan?” perlfaq8, nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed dec. 4, 2008). 16. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 17. thomas dwyer iii, qpage solutions, http://www.qpage .org/ (accessed dec. 9, 2008). 18. petr šimek, “nagioschecker,” google code, aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed dec. 8, 2008). 19. “notifications,” monitoringexchange, http://www .monitoringexchange.org/inventory/utilities/addon-projects/notifications (accessed dec. 23, 2009). 14 information technology and libraries | march 2010 appendix a. service detail display from test system appendix b. service details for opac (hip) and ils (horizon) servers from production system appendix c. sybase freespace trends for a specified period appendix d. connectivity history for a specified period appendix e. availability report for host shown in appendix d appendix f. templates.cfg file ############################################################################ # templates.cfg sample object templates ############################################################################ ############################################################################ # contact templates ############################################################################ monitoring network and service availability with open-source software | silver 15 # generic contact definition template this is not a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # host templates ############################################################################ # generic host definition template this is not a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # linux host definition template this is not a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } appendix f. templates.cfg file (cont.) 16 information technology and libraries | march 2010 # define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # service templates ############################################################################ # generic service definition template this is not a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } appendix f. templates.cfg file (cont.) monitoring network and service availability with open-source software | silver 17 # define a ping service. this is not a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } appendix f. templates.cfg file (cont.) appendix g. groups.cfg file ############################################################################ # contact group definitions ############################################################################ # we only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias nagios administrators members nagiosadmin } ############################################################################ # host group definitions ############################################################################ # define an optional hostgroup for linux machines define hostgroup{ hostgroup_name linux-servers ; the name of the hostgroup alias linux servers ; long name of the group } # create a new hostgroup for ils servers define hostgroup{ hostgroup_name ils-servers ; the name of the hostgroup alias ils servers ; long name of the group } # create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; the name of the hostgroup alias network switches ; long name of the group } ############################################################################ # service group definitions ############################################################################ 18 information technology and libraries | march 2010 # define a service group for network connectivity define servicegroup{ servicegroup_name network alias network infrastructure services } # define a servicegroup for ils define servicegroup{ servicegroup_name ils-services alias ils related services } appendix g. groups.cfg file (cont.) appendix h. contacts.cfg ############################################################################ # contacts.cfg sample contact/contactgroup definitions ############################################################################ # just one contact defined by default the nagios admin (that’s you) # this contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias nagios admin email nagios@localhost } appendix i. opac.cfg ############################################################################ # opac server ############################################################################ ############################################################################ # host definition ############################################################################ # define a host for the server we’ll be monitoring # change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias opac server monitoring network and service availability with open-source software | silver 19 appendix i. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # service definitions ############################################################################ # create a service for monitoring the http port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # create a service for monitoring the web service define service{ use generic-service host_name opac service_description web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # create a service for monitoring the opac search define service{ use generic-service host_name opac service_description opac search check_command check_hip_search } # create a service for monitoring the z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } appendix j. switches.cfg ############################################################################ # switch.cfg sample config file for monitoring switches ############################################################################ ############################################################################ # host definitions ############################################################################ 20 information technology and libraries | march 2010 appendix k. check_hip_search script #!/usr/bin/perl -w ######################### # check horizon information portal (hip) status. # hip is the web-based interface for dynix and horizon # ils systems by sirsidynix corporation. # # this plugin is based on a standalone perl script written # by dave pattern. please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # the original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use lwp::useragent; # note the requirement for perl module lwp::useragent! use lib “/usr/lib/nagios/plugins”; use utils qw($timeout %errors); # define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias gateway switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # service definitions ############################################################################ ### # create a service to ping to switches # note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description ping normal_check_interval 5 retry_check_interval 1 } appendix j. switches.cfg monitoring network and service availability with open-source software | silver 21 ### some configuration options my $hipserverhome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipserversearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.gw&term=li nux&x=18&y=13&aspect=subtab132&getxml=true”; my $hipsearchtype = “xml”; my $httpproxy = ‘’; ### check home page is available... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserverhome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “hip_search critical: $status\n”; exit $errors{‘critical’}; } } ### check search page is returning results... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserversearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipsearchtype ) eq ‘html’ ) { if ( $content =~ /\<b\>(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; appendix k. check_hip_search script (cont.) 22 information technology and libraries | march 2010 } } if( lc( $hipsearchtype ) eq ‘xml’ ) { if( $content =~ /\<hits\>(\d+?)\<\/hits\>/ ) { $results = $1; } } ### modified section original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “hip_search critical: no results returned|results=0\n”; exit $errors{‘critical’}; } if ( $results ) { print “hip_search ok: $results results returned|results=$results\n”; exit $errors{‘ok’}; } } } appendix k. check_hip_search script (cont.) appendix l. nagios checker display tending a wild garden: library web design for persons with disabilities | vandenbark 23 r. todd vandenbark tending a wild garden: library web design for persons with disabilities nearly one-fifth of americans have some form of disability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. understanding how persons with disabilities access web-based content is critical to accessible design. recent research supports the use of a database-driven model for library web development. existing technologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility information from vendors. librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. i n march 2007, eighty-two countries signed the united nations’ convention on the rights of persons with disabilities, including canada, the european community, and the united states. the convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 among the many proscriptions for assuring respect and equal treatment of people with disabilities (pwd) under the law, signatories agreed to take appropriate measures: (g) to promote access for persons with disabilities to new information and communications technologies and systems, including the internet; and (h) to promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. in addition, the convention seeks to guarantee equal access to information by doing the following: (c) urging private entities that provide services to the general public, including through the internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) encouraging the mass media, including providers of information through the internet, to make their services accessible to persons with disabilities.2 because the internet and its design standards are evolving at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. this paper evaluates the challenge of web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. examining the provision of it for this demographic is vital because according to the u.s. census bureau, the u.s. public includes about 51.2 million noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. this means that nearly one-fifth of the u.s. public faces some physical, mental, sensory, or other functional impairment (18 percent in 2002).3 because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n current u.s. regulations, standards, and guidelines in 1990 congress enacted the americans with disabilities act (ada), the first comprehensive legislation mandating equal treatment under the law for pwd. the ada prohibits discrimination against pwd in employment, public services, public accommodations, and in telecommunications. title ii of the ada mandates that all state governments, local governments, and public agencies provide access for pwd to all of their activities, services, and programs. since school, public, and academic libraries are under the purview of title ii, they must “furnish auxiliary aids and services when necessary to ensure effective communication.”4 though predating widespread use of the internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. changes to section 508 of the 1973 rehabilitation act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 many state and local governments have since passed laws applying the standards of section 508 to government agencies and related services. according to the access board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to pwd, information and communication technology (ict) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. the term electronic r. todd vandenbark (todd.vandenbark@utah.edu) is web services librarian, eccles health sciences library, university of utah, salt lake city. 24 information technology and libraries | march 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, world wide web sites, multimedia, and office equipment such as copiers and fax machines.6 the access board further specifies guidelines for “web-based intranet and internet information and applications,” which are directly relevant to the provision of such services in libraries.7 what follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) a text equivalent for every non-text element shall be provided. assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information associated with each picture. if an image directs the user to do something, the associated text must explain the purpose and meaning of the image. this way, someone who cannot see the screen can understand and navigate the page successfully. this is generally accomplished by using the “alt” and “longdesc” attributes for images: <img src=“image.jpg” alt=“short description of image.” longdesc=“explanation.txt” />. however, these aids also can clutter a page when not used properly. the current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. however, freedom scientific’s jaws 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 this can be confusing to the end user. longer content can be put into a separate text file and the file linked to using the “longdesc” attribute. when a page contains audio or video files, a text alternative needs to be provided. for audio files such as interviews, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. for video clips such as those on youtube, captions must accompany the clip. (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. this means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. while color can be used, it cannot be the sole source or indicator of information. imagine an educational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. this would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) documents shall be organized so they are readable without requiring an associated style sheet. the introduction of cascading style sheets (css) can improve accessibility because they allow the separation of presentation from content. however, not all browsers fully support css, so webpages need to be designed so any browser can read them accurately. the content needs to be organized so that it can be read and understood with css formatting turned off. (e) redundant text links shall be provided for each active region of a server-side image map, and (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. an image map can be thought of as a geometrically defined and arranged group of links to other content on a site. a clickable map of the fifty u.s. states is an example of a functioning image map. a server-side image map would appear to a screen reader only as a set of coordinates, whereas clientside maps can include information about where the link leads through “alt” text. the best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) row and column headers shall be identified for data tables, and (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. correct table coding is critical. each table should use the “table summary” attribute to provide a meaningful description of its content and arrangement: <table summary=“concise explanation belongs here.”>. headers should be coded using the table header (“th”) tag, and its “scope” attribute should specify whether the header applies to a row or a column: <th scope=“col”> or <th scope=“row”>. if the table’s content is complex, it may be necessary to provide an alternative presentation of the information. it is best to rely on css for page layout, taking into consideration the directions in subparagraph (d) above. (i) frames shall be titled with text that facilitates frame identification and navigation. frames are a deprecated feature of html, and their use should be avoided in favor of css layout. (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. lights with flicker rates in this range can trigger epileptic seizures. blinking or flashing elements on tending a wild garden: library web design for persons with disabilities | vandenbark 25 a webpage should be avoided until browsers provide the user with the ability to control flickering. (k) a text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished any other way. the content of the text-only page shall be updated whenever the primary page changes. complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the english alphabet in american sign language. this requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. some consider this to be the web’s version of separate-but-equal services, and should be avoided.9 offering a text-only alternative site can increase the sense of exclusion that pwd already feel. also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. finally, a text-only version increases the workload of web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. scripting languages such as javascript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. if functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of characters. using redundant text links avoids this result. (m) when a web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [subpart b: technical standards] §1194.22(a) through (i). web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. when using applications such as quicktime or realaudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) when electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submission of the form, including all directions and cues. if scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. each element of a form needs to be labeled properly using the <label> tag. (o) a method shall be provided that permits users to skip repetitive navigation links. persons using screen reader software typically navigate through pages using the tab key, listening as the text is read aloud. websites commonly place their logo at the top of each page and make this graphic a link to the site’s homepage. many sites also use a line of graphic images just beneath this logo on every page to serve as a navigation bar. to avoid having to listen through this same list of links on every page just to get to the page’s content, a “skip to content” link as the first option at the top of each page provides a simple solution to this problem. (p) when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. some sites log a user off if they have not typed or otherwise interacted with the page after a certain time period. users must be notified in advance that this is going to happen and given sufficient time to respond and request more time as needed. n standards-setting groups and their work one organization that seeks to move internet technology beyond basic section 508 compliance is the web accessibility initiative (wai) of the world wide web consortium (w3c). the mission of the wai is to develop n guidelines that are widely regarded as the international standard for web accessibility; n support materials to help understand and implement web accessibility; and n resources through international collaboration.10 the w3c published its first web content accessibility guidelines (wcag 1.0) in may of 1999 for making online content accessible to pwd. by following these guidelines, developers create web content that is readily available to every user regardless of the way it’s accessed. the wai provides ten quick tips for improving accessibility in website design: n images and animations. use the “alt” attribute to describe the function of each visual. n image maps. use the client-side map and text for hotspots. n multimedia. provide captioning and transcripts of audio, and descriptions of video. 26 information technology and libraries | march 2010 n hypertext links. use text that makes sense when read out of context. for example, avoid “click here.” n page organization. use headings, lists, and consistent structure. use css for layout and style where possible. n graphs and charts. summarize or use the “longdesc” attribute. n scripts, applets, and plug-ins. provide alternative content in case active features are inaccessible or unsupported. n frames. use the “noframes” element and meaningful titles. n tables. make line-by-line reading sensible. summarize. n check your work. validate. use tools, checklist, and guidelines at http://www.w3.org/tr/wcag.11 many libraries and other organizations have sought to follow wcag 1.0 since it was published. recently, the w3c updated their standards to wcag 2.0, and the wai website offers an overview of these guidelines along with a “customizable quick reference” designed to facilitate successful compliance. the principles behind 2.0 can be summarized by the acronym p.o.u.r. perceivable n provide text alternatives for non-text content. n provide captions and alternatives for multimedia. n make information adaptable and available to assistive technologies. n use sufficient contrast to make things easy to see and hear. operable n make all functionality keyboard accessible. n give users enough time to read and use content. n do not use content known to cause seizures. n help users navigate and find content. understandable n make text readable and understandable. n make content appear and operate in predictable ways. n help users avoid and correct mistakes. robust n maximize compatibility with current and future technologies.12 these guidelines offer assistance in creating accessible web-based materials. given their breadth, however, they raise concerns of overly wide interpretation and the strong possibility of falling short of section 508 standards. reading the details in wcag 2.0 does not give any additional assistance to library web developers on how to create a section 508–compliant website. clark points out that the three wcag 2.0 documents are long (72–165 pages), confusing, and sometimes internally contradictory.13 the goal of a library webmaster is to provide an interface (website, opac, database, and so on) that is both cutting-edge and accessible, and to encourage its use by patrons of all ability levels. while they have outlined a helpful rationale, the w3c’s overlong guidelines do little to help library web developers to achieve this goal. n recommendations libraries today typically offer three types of web-based resources: (1) access to the internet, (2) access to subscription databases, and (3) a library’s own webpage, all of which need to be accessible to pwd. libraries trying to comply with section 508 are required to “furnish auxiliary aids and services when necessary to ensure effective communication.”14 there are a number of options available to libraries on tight budgets. the first set involves the features built into each computer’s operating system and software. for some users with visual impairments, enlarging the font size of text and images on the screen will make electronic content more accessible. both macintosh and windows system software have universal-access capabilities built in, including the ability to read aloud text that is on the screen using synthesized speech. the mac read-aloud tool is called voice over; the windows read-aloud tool is called narrator. both systems allow for screen magnification. exploring and learning the capabilities of these systems to enhance accessibility is a free and easy first step for any library’s technology offerings, regardless of funding restrictions. libraries with more substantial technology budgets have a wide variety of hardware and software options to choose from to meet the needs of pwd. for patrons with visual impairments, several software packages are available to read aloud the content of a website or other electronic document using synthesized speech. jaws by freedom scientific and windoweyes by gw micro are two of the best-known software packages, and both include the ability to output to a refreshable braille display (which both companies also sell). kurzweil 3000 is an education-oriented software package that not only reads on-screen text aloud but has a wealth of additional tools to assist students with learning difficulties such as attention deficit disorder or dyslexia. it is designed to integrate with any education package as well as to assist students whose primary language is not english. persons with low vision needing screen magnification beyond the features windows offers may look to magic by freedom scientific or zoomtext by ai squared. some of these tending a wild garden: library web design for persons with disabilities | vandenbark 27 software companies offer free trial versions, have online demonstrations, or both. because prices for this software and related equipment can be high, it is prudent to first check with patrons with visual impairments and professionals in the field prior to making your purchase. humbert and stores, members of indiana university’s web accessibility team, offer accessibility evaluations of websites and other services at the university. when asked to compare windows and macintosh systems as to their usefulness in assisting pwd with web-based media, humbert rated the windows operating system superior, explaining that it has the proper “handles” coded into its software for screen readers and assistive technologies to grab onto. assistive technology software is more stable in windows vista because its predecessor, windows xp, “used hacked together drivers to display the information.”15 humbert discourages the use of vista and jaws on an older machine because vista is a memory hog and can crash jaws along with the rest of the system. the web browsers internet explorer and firefox allow the user to enlarge text and images on a webpage, though firefox is more effective. text can be enlarged only if the webpage being viewed is designed using resizable fonts. stores, who is profoundly visually impaired, uses jaws screen-reader software to work and to surf the web. she notes that both browsers work equally well with screenreader software.16 an important web-based resource that libraries provide is subscription databases. however, as one study has shown, “most librarians lack the time, resources and/or skills to evaluate the degree to which their library subscription databases are accessible to their disability communities.”17 the question is do the vendors themselves make an effort to produce an accessible product? a 2007 survey of twelve major database companies found that while most “have integrated accessibility standards/ guidelines into their search interfaces and/or plan to improve accessibility in future releases,” only five actually conducted usability studies with people who use assistive technology. a number of studies have found that “while most databases are functionally accessible, companies need to do more to meet the needs of the disability community and assure librarians of the accessibility of their products.”18 subscription databases can be inaccessible to pwd in the display of search results and accompanying information. the three most common forms of results delivery are html full text, html full text with graphics, and pdf files. pdf files are notoriously inaccessible to persons using screen readers. while adobe has made significant strides in rendering pdfs accessible, many databases contain numerous pdf documents created in versions of adobe acrobat prior to version 5.0 (released in 2001), which are not properly tagged for screen readers. even newer pdf documents are only as accessible as their tagging allows. journal articles received from publishers may or may not be properly tagged, so database companies cannot guarantee that their content is fully accessible. one vendor that is avoiding this trap is jstor. using optical character recognition (ocr) software, jstor delivers image-based pdfs with embedded text to make their content available to screen readers.19 librarians must insist that database packages be accessible and compatible with the forms of assistive technology most frequently used by their patrons, both in-house and online. one tool used to evaluate database (or other product) accessibility is the voluntary product accessibility template (vpat). created in partnership between the information technology industry (iti) council and the u.s. general services administration (gsa) in 2001, it provides “a simple, internet-based tool to assist federal contracting and procurement officials in fulfilling the new market research requirements contained in the section 508 implementing regulations.”20 vpat is a voluntary disclosure form arranged in a series of tables listing the criteria of relevant subsections of section 508 discussed previously. blank cells are provided to allow company representatives to describe how their product’s supporting features meet the criteria and to provide additional detailed information. library personnel can request that vendors complete this form to document which subsections of section 508 their products meet, and how. to be most useful, the form needs to be completed by company representatives with both a clear understanding of section 508 and its technical details and thorough knowledge of their product. knowledgeable library staff are encouraged to verify the quality and accuracy of the information provided before purchasing. like databases, a library’s website needs to be accessible to patrons with a variety of needs. according to muncaster, accessible sites are 35 percent easier for everyone to use and are more likely to be found by internet search engines.21 fully accessible websites are simpler to maintain and are on average 50 percent smaller than inaccessible ones, which means they download faster, making them easier to use.22 in creating a basic site, current best practice has been to render the content in html or xhtml and design the layout using css. this way, if it is discovered the site’s pages are not fully accessible, a simple change to the css updates all pages, saving the site manager time and effort. finally, creating an accessible site from the beginning is substantially easier than retrofitting an old one. a complete rebuild of a library website is an opportunity to improve accessibility. reynolds’ article on creating a user-centered website for the johnson county (kans.) library offers an example of how libraries can apply basic information architecture design principles on a budget. johnson county focused on simple, low-budget 28 information technology and libraries | march 2010 usability studies involving patrons in the selection of site navigation categories, designing the layout, and testing the resulting user interface. by involving average users in this process, this library was able to achieve substantial improvements in the site’s usability. prior to the redesign, usability testing determined that 42 percent of users were not successful in finding information on the library’s old site. after the redesign, “only 4% of patrons were unsuccessful in finding core-task information on the first attempt.”23 even so, a quick test of the site with the online accessibility evaluation tool cynthiasays indicates that it still does not fully meet the requirements of section 508. had the library’s staff included pwd in their process, the demonstrated degree of improvement might have allowed them to meet and possibly exceed this standard. an understanding of how a person with disabilities experiences the online environment can help point the way toward improved accessibility. a recent study in the united kingdom tracked the eye movements of ablebodied computer users in an effort to answer these questions. researchers asked eighteen people with normal or corrected vision to search for answers on two versions of a bbc website—the standard graphical page and the textonly version. subjects’ eyes tended to dart around the standard page “as they attempt to locate what appears visually to be the next most likely location”24 for the answer. but in searching the text-only page, subjects went line-by-line, making smaller jumps across each page. researchers determined that the webpage and its layout serve as a form of external memory, providing visual cues to the structure of its content and how to navigate it. if the internet is an information superhighway, then the layout of a standard webpage serves as the borders and directional signs for browsing. the visual cues and navigation aids inherent in current webpages’ layouts provide no auditory equivalent for presentation to people with visual impairments. information seeking on the web is a complex process requiring “the ability to switch and coordinate among multiple information-seeking strategies” such as browsing, scanning, query-based searching, and so on.25 if web browsers could translate formatting and presentation into audio tailored to the needs of the visually impaired, the use of the internet would be a far more satisfying experience for those users. however, such web programming would require years of additional research and development. in the meantime, web librarians must strive to build sites that are clean, hierarchical, and usable by all persons by following to the standards and guidelines currently available. one way to enhance the accessibility of sites is to follow a database-driven web development model. in addition to using xhtml and css, dunlap recommends that content be stored in a relational database such as mysql and that a coding language such as php be used to create pages dynamically. this approach has two advantages. first, it allows for the creation of “a flexible website design style that lives in a single, easily modified file that controls the presentation of every web page of the site.”26 second, it requires far less time for site maintenance, freeing staff to devote time to assuring accessibility while accommodating changes in web technology. such a model can be used by database vendors to ensure that their services can seamlessly integrate with the library’s online content. use of mobile phones and similar devices to browse the web is at an all-time high, and content providers are eager to make their sites mobile-friendly. many of these end users experience similar barriers to accessing this content as pwd do. for example, persons with some motor disabilities as well as mobile phones with only a numeric keypad cannot access sites with links requiring the use of a mouse. sites that follow either the w3c’s mobile web best practices (mwbp) or wcag are well on their way to meeting both standards.27 by properly associating labels with their controls, internet content can be made fully accessible to both end users. understanding the similarities between mwpb and wcag can lead to website design that is truly perceivable, operable, understandable, and robust. n summary librarians with responsibility for web design and technology management operate in an evolving environment. legal requirements make clear the expectation to serve the wide variety of needs of patrons with disabilities. yet the guidelines and standards available to assist in this venture range from complex to vague and insufficient. assistive technologies continue to improve with many traditional vendors confident that their products are accessible. in actual use, however, substantial challenges and shortcomings remain. the challenge for technology librarians is to be proactive in keeping abreast of technological advances, to experiment and learn from their efforts, and to continually update and adapt to provide web or hypermedia information and services to patrons of all kinds. references 1. united nations, convention on the rights of persons with disabilities, 2008, http://www.un.org/disabilities/default .asp?navid=12&pid=150 (accessed aug. 10, 2009). 2. ibid. 3. erika steinmetz, americans with disabilities (washington, d.c.: u.s. census bureau, 2002). tending a wild garden: library web design for persons with disabilities | vandenbark 29 4. u.s. department of justice, civil rights division, disability rights section, “title ii highlights,” aug. 29, 2002, http:// www.ada.gov/t2hlt95.htm (accessed july 26, 2008). 5. marilyn irwin, resources and services for people with disabilities: lesson 1b transcript (indianapolis: indiana university at indianapolis school of library and information science, 2008): 10 6. ibid., 10 7. 1998 amendment to section 508 of the rehabilitation act, subpart b—technical standards, §1194.22, http://www .section508.gov/index.cfm?fuseaction=content&id=12#appli cation (access dec. 2, 2009). 8. access it, “how long can an ‘alt’ attribute be?” university of washington, 2008, http://www.washington.edu/ accessit/articles?257 (accessed dec. 12, 2008). 9. matt may, “on ‘separate but equal’ design,” online posting, june 24, 2004, bestkungfu weblog, http://www.bestkungfu .com/archive/date/2004/06/on-separate-but-equal-design/ (accessed dec. 18, 2008). 10. web accessibility initiative, “wai mission and organization,” 2008, http://www.w3.org/wai/about.html (accessed july 22, 2008). 11. shawn lawton henry and pasquale popolizio, “wai, quick tips to make accessible web sites,” world wide web consortium, feb. 5, 2008, http://www.w3.org/wai/quicktips/ overview.php (accessed mar. 30, 2008). 12. ben caldwell et al., “web content accessibility guidelines (wcag) 2.0,” world wide web consortium, dec. 11, 2008, http://www.w3.org/tr/wcag20/ (accessed july 27, 2008). 13. joe clark, “to hell with wcag 2,” a list apart no. 217 (may 26, 2006), http://www.alistapart.com/articles/tohellwith wcag2 (accessed july 25, 2008). 14. u.s. department of justice, “title ii highlights.” 15. joseph a. humbert and mary stores, questions about new software and accessibility (richmond, ind., july 28, 2008). 16. ibid. 17. s. l. byerley, m. b. chambers, and m. thohira, “accessibility of web-based library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509–27. 18. ibid. 19. p. muncaster, “poor accessibility has a price,” vnu net, feb. 9, 2006, http://www.vnunet.com/articles/send/2150099 (accessed july 27, 2008). 20. information technology industry council, “faq: voluntary product accessibility template (vpat),” http://www.itic .org/archives/articles/20040506/faq_voluntary_product_ accessibility_template_vpat.php (accessed july 29, 2008). 21. muncaster, “poor accessibility has a price.” 22. isaac hunter dunlap, “how database-driven web sites enhance accessibility,” library hi tech 23, no. 8 (2008): 34–38. 23. erica reynolds, “the secret to patron-centered web design: cheap, easy, and powerful usability techniques,” computers in libraries 28, no. 6 (2008): 6–47. 24. caroline jay et al., “how people use presentation to search for a link: expanding the understanding of accessibility on the web,” universal access in the information society 6, no. 3 (2006): 307–20. 25. c. kouroupetroglou, m. salampasis, and a. manitsaris, “browsing shortcuts as a means to improve information seeking of blind people in the www,” universal access in the information society 6, no. 3 (2007): 11. 26. dunlap, “how database-driven web sites enhance accessibility.” 27. web accessibility initiative, “mobile web best practices 1.0,” july 29, 2008, http://www.w3.org/tr/mobile-bp (accessed aug. 10, 2009). the next generation library catalog | zhou 151are your digital documents web friendly? | zhou 151 are your digital documents web friendly?: making scanned documents web accessible the internet has greatly changed how library users search and use library resources. many of them prefer resources available in electronic format over traditional print materials. while many documents are now born digital, many more are only accessible in print and need to be digitized. this paper focuses on how the colorado state university libraries creates and optimizes text-based and digitized pdf documents for easy access, downloading, and printing. t o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them webaccessible. there are two types of print documents, graphic-based and text-based. if we apply the same techniques to digitize these two different types of materials, the documents produced will not be web-friendly. graphic-based materials include archival resources such as historical photographs, drawings, manuscripts, maps, slides, and posters. we normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. then we save the scanned images in tiff (tagged image file format) for archival purposes and convert the tiffs to jpeg (joint photographic experts group) 2000 or jpeg for web access. however, the same practice is not suitable for modern text-based documents, such as reports, journal articles, meeting minutes, and theses and dissertations. many old text-based documents (e.g., aged newspapers and books), should be yongli zhoututorial files for fast web delivery as access files. for text-based files, access files normally are pdfs that are converted from scanned images. “bcr’s cdp digital imaging best practices version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncompressed.1 this statement applies to archival images, such as photographs, manuscripts, and other image-based materials. if we adopt the same approach for modern text documents, the result may be problematic. pdfs that are created from such master files may have the following drawbacks: ■■ because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ they may crash a user’s computer because they use more memory while viewing. ■■ they sometimes cannot be printed because of insufficient printer memory. ■■ poor print and on-screen viewing qualities can be caused by background noise and bleedthrough of text. background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ the ocr process sometimes does not work for high-resolution images. ■■ content creators need to spend more time scanning images at a high resolution and converting them to pdf documents. web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. these documents often have faded text, unusual fonts, stains, and colored background. if they are scanned using the same practice as modern text documents, the document created can be unreadable and contain incorrect information. this topic is covered in the section “full-text searchable pdfs and troubleshooting ocr errors.” currently, pdf is the file format used for most digitized text documents. while pdfs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. for example, a multipage pdf may have a large file size, which increases download time and the memory required while viewing. sometimes the download takes so long it fails because a time-out error occurs. printers may have insufficient memory to print large documents. in addition, the optical character recognition (ocr) process is not accurate for high resolution images in either color or grayscale. as we know, users want the ability to easily download, view, print, and search online textual documents. all of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. this paper addresses how colorado state university libraries (csul) manages these problems and others as staff create web-friendly digitized textual documents. topics include scanning, long-time archiving, full-text searchable pdfs and troubleshooting ocr problems, and optimizing pdf files for web delivery. preservation master files and access files for digitization projects, we normally refer to images in uncompressed tiff format as master files and compressed yongli zhou is digital repositories librarian, colorado state university libraries, colorado state university, fort collins, colorado 152 information technology and libraries | september 2010152 information technology and libraries | september 2010 factors that determine pdf file size. color images typically generate the largest pdfs and black-and-white images generate the smallest pdfs. interestingly, an image of smaller file size does not necessarily generate a smaller pdf. table 1 shows how file format and color mode affect pdf file size. the source file is a page containing black-and-white text and line art drawings. its physical dimensions are 8.047" by 10.893". all images were scanned at 300 dpi. csul uses adobe acrobat professional to create pdfs from scanned images. the current version we use is adobe acrobat 9 professional, but most of its features listed in this paper are available for other acrobat versions. when acrobat converts tiff images to a pdf, it compresses images. therefore a final pdf has a smaller file size than the total size of the original images. acrobat compresses tiff uncompressed, lzw, and zip the same amount and produces pdfs of the same file size. because our in-house scanning software does not support tiff g4, we did not include tiff g4 test data here. by comparing similar pages, we concluded that tiff g4 works the same as tiff uncompressed, lzw, and zip. for example, if we scan a text-based page as blackand-white and save it separately in tiff uncompressed, lzw, zip, or g4, then convert each page into a pdf, the final pdf will have the same file size without a noticeable quality difference. tiff jpeg generates the smallest pdf, but it is a lossy format, so it is not recommended. both jpeg and jpeg 2000 have smaller file sizes but generate larger pdfs than those converted from tiff images. recommendations 1. use tiff uncompressed or lzw in 24 bits color for pages with color graphs or for historical documents. 2. use tiff uncompressed or lzw compress an image up to 50 percent. some vendors hesitate to use this format because it was proprietary; however, the patent expired on june 20, 2003. this format has been widely adopted by much software and is safe to use. csul saves all scanned text documents in this format. ■■ tiff zip: this is a lossless compression. like lzw, zip compression is most effective for images that contain large areas of single color. 2 ■■ tiff jpeg: this is a jpeg file stored inside a tiff tag. it is a lossy compression, so csul does not use this file format. other image formats: ■■ jpeg: this format is a lossy compression and can only be used for nonarchival purposes. a jpeg image can be converted to pdf or embedded in a pdf. however, a pdf created from jpeg images has a much larger file size compared to a pdf created from tiff images. ■■ jpeg 2000: this format’s file extension is .jp2. this format offers superior compression performance and other advantages. jpeg 2000 normally is used for archival photographs, not for text-based documents. in short, scanned images should be saved as tiff files, either with compression or without. we recommend saving text-only pages and pages containing text and/or line art as tiff g4 or tiff lzw. we also recommend saving pages with photographs and illustrations as tiff lzw. we also recommend saving pages with photographs and illustrations as tiff uncompressed or tiff lzw. how image format and color mode affect pdf file size color mode and file format are two on-screen viewing and print qualities. in the following sections, we will discuss how to make scanned documents web-friendly. scanning there are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. these factors should be kept in mind when scanning text documents. file format and compression most digitized documents are scanned and saved as tiff files. however, there are many different formats of tiff. which one is appropriate for your project? ■■ tiff: uncompressed format. this is a standard format for scanned images. however, an uncompressed tiff file has the largest file size and requires more space to store. ■■ tiff g3: tiff with g3 compression is the universal standard for faxs and multipage line-art documents. it is used for blackand-white documents only. ■■ tiff g4: tiff with g4 compression has been approved as a lossless archival file format for bitonal images. tiff images saved in this compression have the smallest file size. it is a standard file format used by many commercial scanning vendors. it should only be used for pages with text or line art. many scanning programs do not provide this file format by default. ■■ tiff huffmann: a method for compressing bi-level data based on the ccitt group 3 1d facsimile compression schema. ■■ tiff lzw: this format uses a lossless compression that does not discard details from images. it may be used for bitonal, grayscale, and color images. it may the next generation library catalog | zhou 153are your digital documents web friendly? | zhou 153 to be scanned at no less than 600 dpi in color. our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating pdfs of good quality. resolutions lower than 300 dpi are not recommended because they can degrade image quality and produce more ocr errors. resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. we compared pdf files that were converted from images of resolutions at 300, 400, and 600 dpi. viewed at 100 percent, the difference in image quality both on screen and in print was negligible. if a page has text with very small font, it can be scanned at a higher resolution to improve ocr accuracy and viewing and print quality. table 2 shows that high-resolution images produce large files and require more time to be converted into pdfs. the time required to combine images is not significantly different compared to scanning time and ocr time, so it was omitted. our example is a modern text document with text and a black-and-white chart. most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. we use 400 dpi for most documents and choose a proper color mode for each page. for example, we scan our theses and dissertations in black-andwhite at 400 dpi for bitonal pages. we scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. other factors that affect pdf file size in addition to the three main factors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase pdf file sizes. figure 1 shows how a clean scan can largely reduce a pdf file size and cover. the updated file has a file size of 42.8 mb. the example can be accessed at http://hdl.handle .net/10217/3667. sometimes we scan a page containing text and photographs or illustrations twice, in color or grayscale and in black-and-white. when we create a pdf, we combine two images of the same page to reproduce the original appearance and to reduce file size. how to optimize pdfs using multiple scans will be discussed in a later section. how image resolution affects pdf file size before we start scanning, we check with our project manager regarding project standards. for some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. use tiff uncompressed, lzw, or g4 in black-and-white for pages containing text or line art. to achieve the best result, each page should be scanned accordingly. for example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. we scanned the original document in color at 300 dpi. the pdf created from these images was 384 mb, so large that it exceeded the maximum file size that our repository software allows for uploading. to optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to blackand-white, and retained the color table 1. file format and color mode versus pdf file size file format scan specifications tiff size (kb) pdf size (kb) tiff color 24 bits 23,141 900 tiff lzw color 24 bits 5,773 900 tiff zip color 24 bits 4,892 900 tiff jpeg color 24 bits 4,854 873 jpeg 2000 color 24 bits 5,361 5,366 jpeg color 24 bits 4,849 5,066 tiff grayscale 8 bits 7,729 825 tiff lzw grayscale 8 bits 2,250 825 tiff zip grayscale 8 bits 1,832 825 tiff jpeg grayscale 8 bits 2,902 804 jpeg 2000 grayscale 8 bits 2,266 2,270 jpeg grayscale 8 bits 2,886 3,158 tiff black-and-white 994 116 tiff lzw black-and-white 242 116 tiff zip black-and-white 196 116 note: black-and-white scans cannot be saved in jpeg, jpeg 2000, or tiff jpeg formats. 154 information technology and libraries | september 2010154 information technology and libraries | september 2010 many pdf files cannot be saved as pdf/a files. if an error occurs when saving a pdf to pdf/a, you may use adobe acrobat preflight (advanced > preflight) to identify problems. see figure 2. errors can be created by nonembedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. by default, the reduce file size procedure in acrobat professional compresses color images using jpeg 2000 compression. after running the reduce file size procedure, a pdf may not be saved as a pdf/a because of a “jpeg 2000 compression used” error. according to the pdf/a competence center, this problem will be eliminated in the second part of the pdf/a standard— pdf/a-2 is planned for 2008/2009. there are many other features in new pdfs; for example, transparency and layers will be allowed in pdf/a2.5 however, at the time this paper was written pdf/a-2 had not been announced.6 portable, which means the file created on one computer can be viewed with an acrobat viewer on other computers, handheld devices, and on other platforms.3 a pdf/a document is basically a traditional pdf document that fulfills precisely defined specifications. the pdf/a standard aims to enable the creation of pdf documents whose visual appearance will remain the same over the course of time. these files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 the goal of pdf/a is for long-term archiving. a pdf/a document has the same file extension as a regular pdf file and must be at least compatible with acrobat reader 4. there are many ways to create a pdf/a document. you can convert existing images and pdf files to pdf/a files, export a document to pdf/a format, scan to pdf/a, to name a few. there are many software programs you can use to create pdf/a, such as adobe acrobat professional 8 and later versions, compart ag, pdflib, and pdf tools ag. simultaneously improve its viewing and print quality. recommendations 1. unnecessary edges: crop out. 2. bleed-through text or graphs: place a piece of white or black card stock on the back of a page. if a page is single sided, use white card stock. if a page is double sided, use black card stock and increase contrast ratio when scanning. often color or grayscale images have bleedthrough problems. scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. background noise: scanning a page containing text or line art as black-and-white can eliminate background noise. many aged documents have yellowed papers. if we scan them as color or grayscale, the result will be images with yellow or gray background, which may increase pdf file sizes greatly. we also recommend increasing the contrast for better ocr results when scanning documents with background colors. 4. blank pages: do not include if they are not required. blank pages scanned in grayscale or color can quickly increase file size. pdf and longterm archiving pdf/a pdf vs. pdf/a pdf, short for portable document format, was developed by adobe as a unique format to be viewed through adobe acrobat viewers. as the name implies, it is table 2. color mode and image resolution vs. pdf file size color mode resolution (dpi) scanning time (sec.) ocr time (sec.) tiff lzw (kb) pdf size (kb) color 600 100 n/a* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 b/w 600 12 18 559 325 b/w 400 10 10 333 235 b/w 300 8 9 232 140 *n/a due to an ocr error the next generation library catalog | zhou 155are your digital documents web friendly? | zhou 155 able. this option keeps the original image and places an invisible text layer over it. recommended for cases requiring maximum fidelity to the original image.8 this is the only option used by csul. 2. searchable image: ensures that text is searchable and selectable. this option keeps the original image, de-skews it as needed, and places an invisible text layer over it. the selection for downsample images in this same dialog box determines whether the image is downsampled and to what extent.9 the downsampling combines several pixels in an image to make a single larger pixel; thus some information is deleted from the image. however, downsampling does not affect the quality of text or line art. when a proper setting is used, the size of a pdf can be significantly reduced with little or no loss of detail and precision. 3. clearscan: synthesizes a new type 3 font that closely approximates the original, and preserves the page background using a low-resolution copy.10 the final pdf is the same as a born-digital pdf. because acrobat cannot guarantee the accuracy of manipulate the pdf document for accessibility. once ocr is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 acrobat professional provides three ocr options: 1. searchable image (exact): ensures that text is searchable and selectfull-text searchable pdfs and troubleshooting ocr errors a pdf created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot figure 1. pdfs converted from different images: (a) the original pdf converted from a grayscale image and with unnecessary edges; (b) updated pdf converted from a blackand-white image and with edges cropped out; (c) screen viewed at 100 percent of the pdf in grayscale; and (d) screen viewed at 100 percent of the pdf in black-and-white. dimensions: 9.127” x 11.455” color mode: grayscale resolution: 600 dpi tiff lzw: 12.7 mb pdf: 1,051 kb dimensions: 8” x 10.4” color mode: black-and-white resolution: 400 dpi tiff lzw: 153 kb pdf: 61 kb figure 2. example of adobe acrobat 9 preflight 156 information technology and libraries | september 2010156 information technology and libraries | september 2010 but at least users can read all text, while the black-and-white scan contains unreadable words. troubleshoot ocr error 3: cannot ocr image based text the search of a digitized pdf is actually performed on its invisible text layer. the automated ocr process inevitably produces some incorrectly recognized words. for example, acrobat cannot recognize the colorado state university logo correctly (see figure 6). unfortunately, acrobat does not provide a function to edit a pdf file’s invisible text layer. to manually edit or add ocr’d text, adobe acrobat capture 3.0 (see figure 7) must be purchased. however, our tests show that capture 3.0 has many drawbacks. this software is complicated and produces it’s own errors. sometimes it consolidates words; other times it breaks them up. in addition, it is time-consuming to add or modify invisible text layers using acrobat capture 3.0. at csul, we manually add searchable text for title and abstract pages only if they cannot be ocr’d by acrobat correctly. the example in troubleshoot ocr error 2: could not perform recognition (ocr) sometimes acrobat generates an “outside of the allowed specifications” error when processing ocr. this error is normally caused by color images scanned at 600 dpi or more. in the example in figure 4, the page only contains text but was scanned in color at 600 dpi. when we scanned this page as blackand-white at 400 dpi, we did not encounter this problem. we could also use a lower-resolution color scan to avoid this error. our experiments also show that images scanned in black-and-white work best for the ocr process. in this article we mainly discuss running the ocr process on modern textual documents. black-and-white scans do not work well for historical textual documents or aged newspapers. these documents may have faded text and background noise. when they are scanned as blackand-white, broken letters may occur, and some text might become unreadable. for this reason they should be scanned in color or grayscale. in figure 5, images scanned in color might not produce accurate ocr results, ocred text at 100 percent, this option is not acceptable for us. for a tutorial on to how to make a full-text searchable pdf, please see appendix a. troubleshoot ocr error 1: acrobat crashes occasionally acrobat crashes during the ocr process. the error message does not indicate what causes the crash and where the problem occurs. fortunately, the page number of the error can be found on the top shortcuts menu. in figure 3, we can see the error occurs on page 7. we discovered that errors are often caused by figures or diagrams. for a problem like this, the solution is to skip the error-causing page when running the ocr process. our initial research was performed on acrobat 8 professional. our recent study shows that this problem has been significantly improved in acrobat 9 professional. figure 3. adobe acrobat 8 professional crash window figure 4. “could not perform recognition (ocr)” error figure 5. an aged newspaper scanned in color and black-and-white aged newspaper scanned in color aged newspaper scanned in black-and-white the next generation library catalog | zhou 157are your digital documents web friendly? | zhou 157 a very light yellow background. the undesirable marks and background contribute to its large file size and create ink waste when printed. method 2: running acrobat’s built-in optimization processes acrobat provides three built-in processes to reduce file size. by default, acrobat use jpeg compression for color and grayscale images and ccitt group 4 compression for bitonal images. optimize scanned pdf open a scanned pdf and select documents > optimize scanned pdf. a number of settings, such as image quality and background removal, can be specified in the optimize scanned pdf dialog box. our experiments show this process can noticably degrade images and sometimes even increase file size. therefore we do not use this option. reduce file size open a scanned pdf and select documents > reduce file size. the reduce file size command resamples and recompresses images, removes embedded base-14 fonts, and subset-embeds fonts that were left embedded. it also compresses document structure and cleans up elements such as invalid bookmarks. if the file size is already as small as possible, this command has no effect.11 after process, some files cannot be saved as pdf/a, as we discussed in a previous section. we also noticed that different versions of acrobat can create files of different file sizes even if the same settings were used. pdf optimizer open a scanned pdf and select advanced > pdf optimizer. many settings can be specified in the pdf optimizer dialog box. for example, we can downsample images from sections, we can greatly reduce a pdf’s size by using an appropriate color mode and resolution. figure 9 shows two different versions of a digitized document. the source document has a color cover and 111 bitonal pages. the original pdf, shown in figure 9 on the left, was created by another university department. it was not scanned according to standards and procedures adopted by csul. it was scanned in color at 300 dpi and has a file size of 66,265 kb. we exported the original pdf as tiff images, batch-converted color tiff images to black-and-white tiff images, and then created a new pdf using blackand-white tiff images. the updated pdf has a file size of 8,842 kb. the image on the right is much cleaner and has better print quality. the file on the left has unwanted marks and figure 8 is a book title page for which we used acrobat capture 3.0 to manually add searchable text. the entire book may be accessed at http://hdl .handle.net/10217/1553. optimizing pdfs for web delivery a digitized pdf file with 400 color pages may be as large as 200 to 400 mb. most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. in some cases, quality may be improved. we will discuss three optimization methods we use. method 1: using an appropriate color mode and resolution as we have discussed in previous ~do university original logo text ocred by acrobat figure 6. incorrectly recognized text sample figure 7. adobe acrobat capture interface figure 8. image-based text sample 158 information technology and libraries | september 2010158 information technology and libraries | september 2010 grayscale. a pdf may contain pages that were scanned with different color modes and resolutions. a pdf may also have pages of mixed resolutions. one page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. the following strategies were adopted by csul: 1. combine bitmap, grayscale, and color images. we use grayscale images for pages that contain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. if a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. if a page contains a very small font and the ocr process does not work well, scan it at a higher resolution and the rest of document at 400 dpi. 4. if a page has both text, color, or grayscale graphs, we scan it twice. then we modify images using adobe photoshop and combine two images in acrobat. in figure 10, the grayscale image has a gray background and a true reproduction of the original photograph. the black-and-white scan has a white background and clean text, but details of the photograph are lost. the pdf converted from the grayscale image is 491 kb and has nine ocr errors. the pdf converted from the black-and-white image is 61kb and has no ocr errors. the pdf converted from a combination of the grayscale and black-and-white images is 283 kb and has no ocr errors. the following are the steps used to create a pdf in figure 10 using acrobat: 1. scan a page twice—grayscale optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. method 3: combining different scans many documents have color covers and color or grayscale illustrations, but the majority of pages are textonly. it is not necessary to scan all pages of such documents in color or a higher resolution to a lower resolution and choose a different file compression. different collections have different original sources, therefore different settings should be applied. we normally do several tests for each collection and choose the one that works best for it. we also make our pdfs compatible with acrobat 6 to allow users with older versions of software to view our documents. a detailed tutorial of how to use the pdf figure 9. reduce file size example figure 10. reduce file size example: combine images the next generation library catalog | zhou 159are your digital documents web friendly? | zhou 159 help.html?content=wsfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed mar. 3, 2010). 3. ted padova adobe acrobat 7 pdf bible, 1st ed. (indianapolis: wiley, 2005). 4. olaf drümmer, alexandra oettler, and dietrich von seggern, pdf/a in a nutshell—long term archiving with pdf, (berlin: association for digital document standards, 2007). 5. pdf/a competence center, “pdf/a: an iso standard—future development of pdf/a,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed july 20, 2010). 6. pdf/a competence center, “pdf/a—a new standard for longterm archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed july 20, 2010). 7. adobe, “creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed mar. 8, 2010). 8. adobe, “recognize text in scanned documents,” 2010, http:// help.adobe.com/en_us/acrobat/9.0/ s t a n d a rd / w s 2 a 3 d d 1 fa c fa 5 4 c f 6 -b993-159299574ab8.w.html (accessed mar. 8, 2010). 9. ibid. 10. ibid. 11. adobe, “reduce file size by saving,” 2010, http://help.adobe.com/en_us/ acrobat/9.0/standard/ws65c0a053 -bc7c-49a2-88f1-b1bcd2524b68.w.html (accessed mar. 3, 2010). the other 76 pages as grayscale and black-and-white. then we used the procedure described above to combine text pages and photographs. the final pdf has clear text and correctly reproduced photographs. the example can be found at http://hdl .handle.net/10217/1553. conclusion our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimizing digitized documents. if proper techniques are used, the final result will be web-friendly resources that are easy to download, view, search, and print. users will be left with a positive impression of the library and feel encouraged to use its materials and services again in the future. references 1. bcr’s cdp digital imaging best practices working group, “bcr’s cdp digital imaging best practices version 2.0,” june 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed mar. 3, 2010). 2. adobe, “about file formats and compression,” 2010, http://livedocs .adobe.com/en_us/photoshop/10.0/ and black-and-white. 2. crop out text on the grayscale scan using photoshop. 3. delete the illustration on the black-and-white image using photoshop. 4. create a pdf using the blackand-white image. 5. run the ocr process and save the file. 6. insert the color graph. select tools > advanced editing > touchup object tool. rightclick on the page and select place image. locate the color graph in the open dialog, then click open and move the color graph to its correct location. 7. save the file and run the reduce file size or pdf optimizer procedure. 8. save the file again. this method produces the smallest file size with the best quality, but it is very time-consuming. at csul we used this method for some important documents, such as one of our institutional repository’s showcase items, agricultural frontier to electronic frontier. the book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. we used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. we scanned appendix a. step-by-step creating a full-text searchable pdf in this tutorial, we will show you how to create a full-text searchable pdf using adobe acrobat 9 professional. creating a pdf from a scanner adobe acrobat professional can create a pdf directly from a scanner. acrobat 9 provides five options: black and white document, grayscale document, color document, color image, and custom scan. the custom scan option allows you to scan, run the ocr procedure, add metadata, combine multiple pages into one pdf, and also make it pdf/a compliant. to create a pdf from a scanner, go to file > create pdf > from scanner > custom scan. see figure 1. at csul, we do not directly create pdfs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. both scanning and running the ocr process can be very time consuming. if an error occurs during these processes, we would have to start over again. we normally scan images on scanning stations by student employees 160 information technology and libraries | september 2010160 information technology and libraries | september 2010 or outsource them to vendors. then library staff will perform quality control and create pdfs on seperate machines. in this way, we can work on multiple documents at the same time and ensure that we provide high-quality pdfs. creating a pdf from scanned images 1. from the task bar select combine > merge files into a single pdf > from multiple files. see figure 2. 2. in the combine files dialog, make sure the single pdf radio button is selected. from the add files dropdown menu select add files. see figure 3. 3. in the add files dialog, locate images and select multiple images by holding shift key, and then click add files button. 4. by default, acrobat sorts files by file names. use move up and move down buttons to change image orders and use the remove button to delete images. choose a target file size. the smallest icon will produce a file with a smaller file size but a lower image quality pdf, and the largest icon will produce a high image quality pdf but with a very large file size. we normally use the default file size setting, which is the middle icon. 5. save the file. at this point, the pdf is not full-text searchable. making a full-text searchable pdf a pdf document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the pdf document for accessibility. once optical character recognition (ocr) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document. adobe acrobat professional provides three ocr options, searchable image (exact), searchable image, and clean scan. because searchable image (exact) is the only option that keeps the original look, we only use this option. to run an ocr procedure using acrobat 9 professional: 1. open a digitized pdf. 2. select document > ocr text recognition > recognize text using ocr. 3. in the recognize text dialog, specify pages to be ocred. 4. in the recognize text dialog, click the edit button in the settings section to choose ocr language and pdf output style. we recommend the searchable image (exact) option. click ok. the setting will be remembered by the program and will be used until a new setting is chosen. sometimes a pdf’s file size increases greatly after an ocr process. if this happens, use the pdf optimizer to reduce its file size. figure 2. merge files into a single pdf figure 3. combine files dialog figure 1. acrobat 9 professional’s create pdf from scanner dialog 54 information technology and libraries | june 2010 tinuing education opportunities for library information technologists and all library staff who have an interest in technology. 2. innovation: to serve the library community, lita expert members will identify and demonstrate the value of new and existing technologies within ala and beyond. 3. advocacy and policy: lita will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. the organization: lita will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. collaboration and outreach: lita will reach out and collaborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. the lita executive committee is currently finalizing the strategies lita will pursue to achieve success in each of the goal areas. it is my hope that the strategies for each goal are approved by the lita board of directors before the 2010 ala annual conference in washington, d.c. that way the finalized version of the lita strategic plan can be introduced to the committee and interest group chairs and the membership as a whole at that conference. this will allow us to start the next fiscal year with a clear road for the future. while i am excited about what is next, i have also been dreading the end of my presidency. i have truly enjoyed my experience as lita president, and in some way wish it was not about to end. i have learned so much and have met so many wonderful people. thank you for giving me this opportunity to serve you and for your support. i have truly appreciated it. a s i write this last column, the song “my way” by frank sinatra keeps going through my head. while this is definitely not my final curtain, it is the final curtain of my presidency. like sinatra i have a few regrets, “but then again, too few to mention.” there was so much more i wanted to accomplish this year; however, as usual, my plans were more ambitious than the time i had available. being lita’s president was a big part of my life, but it was not the only part. those other parts—like family, friends, work, and school—demanded my attention as well. i have thought about what to say in this final column. do i list my accomplishments of the last year? nah, you can read all about that in the lita annual report, which i will post in june. tackle some controversial topic? while i can think of a few, i have not yet thought of any solutions, and i do not want to rant against something without proposing some type of solution or plan of attack. i thought instead i would talk about where i have devoted a large part of my lita time over the last year. as i look back at the last year, i am also thinking ahead to the future of lita. we are currently writing lita’s strategic plan. we have a lot to great ideas to work with. lita members are always willing to share their thoughts both formally and informally. i have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway conversations, surveys, e-mail, etc., to create a roadmap for the future. after reviewing all of the ideas gathered over the last three years, i was able to narrow that list down to six major goal areas. with the assistance of the lita board of directors and the lita executive committee, we whittled the list down to five major goal areas of the lita strategic plan: 1. training and continuing education: lita will be nationally recognized as the leading source for conmichelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: the end and new beginnings editorial | truitt 55 a recent library journal (lj) story referred to “the palpable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the public library association’s 2010 annual conference by staff members of the rangeview (colo.) library district. now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures undertaken by rangeview. far be it from me to second-guess the rangeview staff’s judgment as to how best to serve the community there.1 rather, what got my attention was lj’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw comfort. in the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. here are just a few examples that i can recall off the top of my head, and in no particular order: ■■ library cafes and coffee shops. ■■ libraries arranged along the lines of chain bookstores. ■■ general-use computers in libraries (including information/knowledge commons and what-have-you) ■■ computer gaming in libraries. ■■ lending laptops, digital cameras, mp3 players and ipods, e-book readers, and now ipads. ■■ mobile technology (e.g., sites and services aimed at and optimized for iphones, blackberries, etc.) ■■ e-books and e-serials. ■■ chat and instant-message reference. ■■ libraries and social networking (e.g., facebook, twitter, second life, etc.). ■■ “breaking down silos,” and “freeing”/exposing our bibliographic data to the web, and reuse by others outside of the library milieu. ■■ ditching our old and “outmoded” systems, whether the object of our scorn is aacr2, lcsh, lcc, dewey, marc, the ils, etc. ■■ library websites generally. remember how everyone—including us—simply had to have a website in the 1990s? and ever since then, it’s been an endless treadmill race to find the perfect, user-centric library web presence? if sisyphus were to be incarnated today, i have little doubt that he would appear as a library web manager and his boulder would be a library website. ■■ oh, and as long as we’re at it, “user-centricity” generally. the implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “next-gen” catalogs. i’m sure i’m forgetting a whole lot more. anyway, you get the picture. each of these has, at one time or another, been positioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrelevance” (or worse!), if we would but adopt it now, or better yet, yesterday. well, to judge from the generally dismal state of libraries as depicted by some opinionmakers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warnings of some about “irrelevance” resulting from ebbing library use. to stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. at the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketingfocused hyperbole of those in and out of librarianship about technological change. each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” when did we surrender our skepticism and awareness of a longer view? what’s wrong with this picture?2 i’d like to suggest another way of viewing this. a couple of years ago, alan weisman published the world without us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. the book begins with our total, overnight disappearance, and asks (1) what would the earth be like without us? and (2) what evidence of our works would remain, and for how long? the bottom line answers for weisman are (1) in the long run, probably much better off, and (2) not much and not for very long, really. so, applying weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? would everything be okay, because as some believe, marc truitteditorial: no more silver bullets, please marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because i hope that what we’re doing is about more than just our own job security. and if the far-fetched should actually happen, and we all disappear? i predict that at some future point, someone will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. notes and references 1. norman oder, “pla 2010 conference: the anythink revolution is ripe,” library journal, mar. 26, 2010, http://www .libraryjournal.com/article/ca6724258.html (accessed mar. 30, 2010). there, i said it! a fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. will the present disclaimer be the subject of similar speculation? 2. one of my favorite antidotes to such bloated, short-term language is embodied in michael gorman’s “human values in a technological age,” ital 20, no. 1 (mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed apr 12, 2010)—highly recommended. the following is but one of many calming and eminently sensible observations gorman makes: the key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. the present we are living in will be the past sooner than we wish. what we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. people in history did not wear quaintly oldfashioned clothes—they wore modern clothes. they did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. in the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. on the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the web anyway, and google will make it findable? absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? would anyone notice? if a tree falls in the woods . . . in short, would it matter? and if so, why and how much? the answer to the preceding two questions, i think, can help to point the way to an approach for understanding and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” what exactly is our “valueadd”? what do we provide that is unique and valuable? we can’t hope to compete with barnes and noble, starbucks, or the googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. instead, new and changed services and approaches should be evaluated in terms of our value-add: if they contribute positively and are within our abilities to do them, great. if they do not contribute positively, then trying to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. some of the “bullets” i listed above may well qualify as contributing to our value-add, and that’s fine. my point isn’t to judge whether they are “bad” or “good.” my argument is about process and how we decide what we should do and not do. understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-inthe-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging selfesteem. in other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. it is the “why” in “why are we here?” if, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then i

status update failed

$instname wireless status